Chapter 4 



Elements of Complex Analysis 



This chapter presents important concepts from the vast field of complex analysis. 

4.1 Functions of one complex variable 

Some functions of the complex plane are introduced in this section. These functions arise in a number of 
applications. 

Definition 4.1.1 (Polynomial) A (scalar) polynomial p in the complex variable s is a function of the form: 



p( s ) = ^PkS k 

k=0 



(4.1) 



where the polynomial coefficients {pk} we complex numbers. The degree ofp is m if and only if s m is the 
largest power with non-zero polynomial coefficient. The zero polynomial has no non-zero coefficient. Its 
degree is 0. ~k 



A function of the form: 



k=o z 



is a polynomial in 1/z. Now, the coefficients can be linear operators in general. For example, 
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where the coefficients are matrices. Such polynomials are important in multi-variable linear systems theory. 



Definition 4.1.2 (Rational function) A (scalar) rational function r in the variable s is a function of the 
form: 

p(s) 



r(s) 



(4.2) 
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where p and q ^ 0 are polynomials in s. A rational function is real rational if and only if the coefficients of 
its numerator and denominator polynomials are real numbers. A rational function is proper if and only if 

lim r(s) 

S—tOO 

exists (as a complex number), in which case, the limit is denoted as r(oo). A proper rational function for 
which r(oc) = 0 is called strictly proper rational. -k 

Example 4.1.1 Every polynomial is a rational function. The functions 

2 J_ f!+3 3 
s + 1 s + 1 

are all real rational functions. The first is proper but not strictly proper, the second is strictly proper and, 
the last two are not proper. A 

Rational functions have many important properties. We shall discuss some of them shortly. 

Definition 4.1.3 (Analytic function) A function is analytic at a point in the complex plane if and only if it 
is differ entiable at that point. A function that is analytic at every point in an open set is said to be analytic 
in that open set. -k 

Example 4.1.2 A polynomial p is analytic at 1 £ C In fact, it is analytic at every point in the complex 
plane. The rational function 

is analytic at every point except 1. The function 

f(s) = s 

is not analytic. A 

A non-trivial fact is that a function of a complex variable is differentiable once if and only if it is differen- 
tiable infinitely many times. Thus, analytic functions are infinitely differentiable at every point of analyticity. 
In contrast, there are functions of one real variable that are differentiable once but not twice. 

Definition 4.1.4 (Power series) A (formal) power series in the complex variable s is of the form: 

oo 

p(s) = J £p k s k (4.3) 

A;=0 

where {p k } are complex numbers. 
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A formal power series reduces to a polynomial when all but a finite number of the coefficients are zero, i.e., 
Pk = 0 for all k > n where n is a positive integer. Polynomials as in (4.1) are finite sums and, hence, can 
be evaluated to a complex number for any s E C. A power series on the other hand is an infinite sum and 
may not be well-defined for some s E C. For example, the geometric series: 

1 + s + s 2 + ■ ■ ■ + s n + • • • 

cannot be evaluated to a complex number at s = 2 as the partial sums diverge. The partial sums of the series 
can be written as: 

1 — s n+1 

l + s + s 2 + \-s n = 



1 - s 

from which we may conclude that the series converges to 

1 



1 - s 

for |s| < 1. This behavior is true of power series in general as the following theorem of Abel shows. 



Theorem 4.1.1 (Abel [1]) Let p be a formal power series. There exists a number R in [0, oo], called the 
radius Of convergence, with the following properties: 



1. The series is absolutely convergent for every s with \s\ < R and uniformly convergent for every s with 
\s\ < p < R. 

2. If\s\> Rthe terms of the series are unbounded and the series diverges. 

3. In \s\ < R the series is an analytic function. ■ 



The radius of convergence of the geometric series mentioned above is 1. Within the open unit disc, i.e. for 
all s with |s| < 1, the series sums up to the analytic function 1/(1 — s). Statement 3 of Abel's theorem says 
that a power series is an analytic function within any open disc contained in the region of convergence of 
the series. The converse is also true. If / is an analytic function defined in an open disc of radius R, then it 
has a power series expansion in that disc. A special case of this fact is the well-known Taylor series [1]. 



Theorem 4.1.2 (Taylor series) If f is analytic in the open disc of radius R centered at 0, then f has the 
power series representation: 

f(s) = f(0) + L £ L s + --- + L - P* n + --- 
1! n\ 

for all s with \s\ < R. ■ 



Our interest in power series comes partly from the following very important definition. 
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Definition 4.1.5 (Exponential function) The power series 

k=0 

converges for each s £ C (that is, radius of convergence is oo) and is called the exponential function. It is 
denoted as e s . *k 

e s is clearly analytic at every point in the complex plane. It is not a rational function. See [1] for detailed 
study of e s and power series. 

Definition 4.1.6 (Poles and zeros) Let f be a scalar function of the complex variable s. A point sq in the 
complex plane (including oo) is a pole of f if and only if 

lim f (s) = oo 

A point so € C where /(so) = 0 is called a zero of f. *k 
Example 4.1.3 The function f(s) — s has a pole at oo and a zero at 0. The rational function 

has a pole at s = — 1 and a zero at s = 1. A 

Proposition 4.1.1 Let r be a proper rational function. Then, r is analytic at oo and has a power series 
expansion about oo of the form: 



oo 



r 



00 = £ 



Ik 
S k 



The coefficients {r^} are known as Markov parameters (in systems theory) or Taylor coefficients. 



4.2 Evaluation of a function at an operator 

We now turn to the important concept of evaluating functions at matrices. In the above discussion, the 
expression f(s) meant that the function / is being evaluated at the complex variable s. Thus, if p is a 
polynomial in s: 

n 

P( s ) = ^2PkS k , 
k=0 

then p(l + 3j) is the value of p at the point 1 + 3j. To calculate this value, we replace every occurrence of 
s with 1 + 3j: 

n 

p(l + 3j) = £> fe (1 + 3j) fe 

A;=0 

and evaluate the expression on the right hand side. This evaluation procedure can be extended to square 
matrices. 
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Definition 4.2.1 (Polynomial evaluated at a square matrix) Let p be a scalar polynomial of the complex 
variable s: 



n 



P( s ) = ^2PkS k 
k=0 

and A 6 C mxm . Then, p evaluated at A, denoted by p(A), is given by: 

n 

p(A) = VkA k = pol + piA + p 2 A 2 + ■■■+ p n A n 

k=0 

and is a m x m matrix. ~k 

Our next objective is to apply this evaluation concept to power series and analytic functions. There is, 
however, a difficulty since power series involves infinite sum and it is not clear when 

oo 
A;=0 

exists as a matrix. Polynomials are finite sums. So, they can be evaluated at any complex number s or any 
square matrix A. We need the following definition to resolve this difficulty. 

Definition 4.2.2 (Spectral radius of a square matrix) Let Abe a square matrix. The spectral radius of A 
is: 

p(A) = max\Xi(A)\ 

i 

where {A, (A)} are the eigenvalues of A. That is, the spectral radius is the maximum of the absolute values 
of the eigenvalues of A. i< 

Theorem 4.2.1 Let p be a power series: 

oo 

P( s ) = ^PkS k 
k=0 

with radius of convergence R > 0. Then, 

oo 

^ Pk A k 

exists as a matrix for any square matrix A whose spectral radius is strictly less than R. ■ 

The theorem can be proved using Jordan forms. It allows us to make the following definitions. 

Definition 4.2.3 (Analytic function evaluated at a square matrix) Let f be a scalar analytic function 
with power (Taylor) series expansion: 

oo 
A;=0 
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whose radius of convergence is R > 0. Then, for any square matrix A whose spectral radius is strictly less 
than R f evaluated at A denoted by f(A) is given by: 

oo 

f(A) = Y,fkA k 

and is a square matrix of the same dimension as A. ~k 



Definition 4.2.4 (Exponential function evaluated at a square matrix) For any square matrix A, the ex- 
ponential function evaluated at A, denoted by e A , is given by: 




and is a square matrix of the same dimension as A. *k 



It is very important to note that e A is not the exponential of the elements of A. The same holds for other 
analytic functions evaluated at a matrix. The evaluation operation, that is, the act of evaluating a function at 
a matrix, has many interesting properties. We list some of these below for general analytic functions as well 
as exponential function. 



Theorem 4.2.2 (Properties of evaluation) Let f be an analytic function with radius of convergence R. Let 
A and B be square matrices whose spectral radii are strictly less than R. The following statements are true. 



1. f (A*) exists and is equal to f{A)*. That is, f evaluated at the complex conjugate transpose of A is 
equal to the complex conjugate transpose of f evaluated at A (we say that evaluation and conjugate 
transposing commute). 

2. Let M be an invertible matrix of the same size as A. Then, f(MAM~ 1 ) exists and is equal to 
Mf(A)M- 1 . 



3. Define: 



C = 



A 0 
0 B 



i.e., the block-diagonal matrix whose diagonal blocks are A and B. Then, /(C) exists and 

f(C) = 



f(A) 0 
0 f(B) 



that is, f evaluated at a block-diagonal matrix is the block-diagonal matrix obtained by evaluating f 
at the diagonal blocks. ■ 



Theorem 4.2.3 (Properties of exponential function) The following statements are true. 
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1. e - I, i.e, the exponential function evaluated at the zero matrix is equal to the identity matrix. 

2. Let Abe a square matrix. Then, e A is invertible and its inverse is given by e~ A . 

3. Let A and B be square matrices. Then, e A+B = e A e B if and only if AB = BA. In this case, we also 
have e A+B = e B e A . ■ 

The definition of f(A) involves a power series in A and, to evaluate f(A), we must calculate the infinite 
sum. This may not be a good way to compute due to numerical errors. Fortunately, linear algebra provides 
efficient computational procedures which we discuss next. 

Theorem 4.2.4 (Evaluating an analytic function - special cases) Let f be an analytic function with ra- 
dius of convergence R and Abe a matrix with spectral radius strictly less than R. The following statements 
are true. 



1. If A is a diagonal matrix, then f(A) is a diagonal matrix whose diagonal elements are f evaluated 
at the diagonal elements of A. 

2. If A is an upper-triangular n x n Jordan matrix: 



A = 



A 1 0 0 
0 A 1 0 



0 0 0 0 
0 0 0 0 



then f(A) is the n X n upper-triangular matrix: 



/(A) /'(A) /"(A)/2! 
0 /(A) /'(A) 



0 0 
0 0 



0 
0 



• • • 0 


0" 


• • • 0 


0 


• • • A 


1 


■ ■ ■ 0 


A. 


(A)/(n 


-2) 


(A)/(n 


-3) 



Jn(A) 



/(A) 
0 



/'(A) 
/(A) 



— f (Jn (A)) 



where 



/"(A) 



denotes the kth derivative of f evaluated at A. 
3. If A is a Jordan matrix of the form: 



J ni (Ai) 0 
0 J n2 (A 2 ) 
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then f(A) is given by: 

'f(JnAM)) o 

0 /(J„ 2 (A 2 )) 
where f (J ni (AJ) for i = 1, 2 are as in Statement 2. 



Accordingly, if A happens to have the special structure indicated in the theorem, then we can easily evaluate 
f(A) without actually adding up the terms of the infinite sum that defines f(A). Here is an example. 



Example 4.2.1 Let 



A = 



0 1 
0 0 



which is an upper-triangular Jordan matrix with eigenvalue A = 0. Take f to be the geometric series: 

oo 

whose radius of convergence is R = 1. The spectral radius of A is: 

p(A)= max \Xi(A)\ = 0 

1=1,2 

which is strictly less than R. Hence, we can apply statement 2 of Theorem 4.2.4 to get: 

7(o) f'(o) 

. o /(0) 

where we used the fact that 

oo 

/'(*) = £ ks\ 





1 


r 




0 


i 



k=l 



/(0) = landf'(0) = 1. 



A 



Now, if A does not have the special structure mentioned in Theorem 4.2.4, then a different procedure is 
needed. For this, we recall the Jordan decomposition from Chapter 3. 



Theorem 4.2.5 (Complex Jordan form theorem) Let Abe anxn matrix (real or complex). There exists 
an invertible matrix M such that 

A = MJM- 1 



where 



Jm(Ai) 0 
0 Jn 2 (A 2 ) 



0 
0 



0 0 ••• J„ m (A m ). 

and J nk (Afc) is an^ X njt upper-triangular Jordan matrix with eigenvalue X^. The block-diagonal matrix J 
is called the Jordan form of A. ■ 
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We conclude this section with the following procedure to evaluate an analytic function at a matrix. 

Theorem 4.2.6 (Evaluating an analytic function) Let f be an analytic function with radius of conver- 
gence R and Abe a matrix with spectral radius strictly less than R. The following statements are true. 

1. Suppose that A is diagonalizable. Let M be the matrix whose columns are the linearly independent 
eigenvectors of A and D be the diagonal matrix whose diagonal elements are the corresponding 
eigenvalues of A, i.e. 

A = MDM- 1 
is the eigenvalue-eigenvector decomposition of A. Then, 

f(A) = Mf(D)M- 1 

where f(D) is the function f evaluated at the diagonal matrix D. 

2. Suppose that A is non-diagonalizable. Let J be the upper-triangular complex Jordan form of A and 
M be the similarity transformation that puts A in its Jordan form, i.e. 

A = MJM- 1 

Then, 

f(A) = Mf( J)M- 1 

where f(J) is the function f evaluated at the block-Jordan matrix D. ■ 

A computer program to evaluate an analytic function / at a matrix A does the following sequence of opera- 
tions: 

• Compute the eigenvalues and eigenvectors of A 

• Calculate the spectral radius of A 

• If the spectral radius is greater than or equal to the radius of convergence of /, then exit saying that 
f(A) cannot be evaluated 

• Otherwise, check if there are n linearly independent eigenvectors where n is the size of A: 

- If so, A is diagonalizable and we apply Statement 1 of Theorem 4.2.6 

- If not, we compute the Jordan form of A and apply Statement 2 of Theorem 4.2.6 along with 
Statements 2-3 of Theorem 4.2.4 



This algorithm is much faster and more accurate even at evaluating polynomial functions. 
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Example 4.2.2 Let 



A = 



2 1 
1 2 



"l 1 




3 0" 




1 1 " 


1 -1 




0 1 




1 -1 



which is a symmetric matrix. So, it is diagonalizable and we can write: 

A 

Thus, 

, \l life 3 0 1 [ 1 1 i _1 
e = , and sin(^4) 

1 -lj [ 0 e 1 ] |_1 -lj 

Similiarly, we can calculate cos(^4), log(^4), etc. 



1 1 
1 -1 



sin(3) 0 
0 sin(l) 



1 1 
1 -1 



A 



Chapter 5 



Normed Linear Spaces and Banach Spaces 



Vector space structure permits addition and scalar multiplication of elements. However, it does not support 
important concepts such as size of elements and distance between elements. This chapter introduces the 
necessary machinery. 

5.1 Norms and normed linear spaces 

We have some intuitive notion of the size of an element. For example, if an element is multiplied by a scalar, 
then the size of the element should get scaled proportionately. The following definition captures this and 
other intuitive properties. Throughout this section, vector spaces are defined over F which can be either € 
or H. 

Definition 5.1.1 (Norm) Let V be a vector space. A norm on V, denoted by \\-\\, is a function from V into 
JR with the following properties: 

I- INI > 0 for all x € V. 

2. \\x\\ = 0 if and only if x = 0 

3. \\ax\\ = \a\\\x\\for all x E V and scalars a 

4. \\x + y\\ < \\x\\ + \\y\\ for all x and y in V . *k 

Property 1 says that norm of vector is greater than or equal to zero. So, we could think of a norm as a 
mapping from V into [0, oo) instead of IR as in the definition. Property 2 is important. It says that there is 
one and only one element whose norm is zero, namely the zero element of V. This property is used to prove 
many results. Property 3 is the effect of scaling mentioned earlier. Property 4 is called triangle inequality 
and states that the norm of a sum is no larger than the sum of the norms. 
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Example 5.1.1 (p-norms or Holder norms on F n ) Consider the vector space F n with the standard basis 
{ e k} k =i- Recall that x 6 F n can be written as: 



n 

x = ] Xk&k 



where the scalars {x k } are called the coordinates of x. For 1 < p < oo, the p -norm of x is given by: 

/ n \ 1 /P 



\\x\ 



£i**r 
\k=i j 



and, forp = oo, the oo-norm of x is given by 

Halloo = , max \x k \ 



k=l,2,---,n 



When p = 2, the definition (5.1) gives the 2-norm: 



(n \ V 2 



1 ^ ; %k%k — 



[xi X2 • • • X ri 



\ 



and is also known as the Euclidean norm of x. 



xi 

X2 



X ■■■>■■ 



1/2 



(5.1) 



(5.2) 



A 



It is important to notice that the definition of p-norm of x makes use of the coordinates of x in the standard 
basis. There is nothing special about the standard basis and any basis may be used to define p-norms 
although the actual value of \\x\\ will depend on the basis used. 



Example 5.1.2 (Weighted 2-norms on F n ) As before, consider the vector space F n with standard basis 
{ e fc}fe=r Let w = {wk}^ =1 be a set of strictly positive numbers. The w-weighted Euclidean norm of x in 
F n is: 

1/2 



\X Q 



, E w k x k x k = 

\ k=l 



( 






0 • 


• 0 " 




~Xl ' 


\ 






0 




• 0 




X2 




[xi x 2 ■ 


%n ] 














V 




. 0 


0 • 


• w n _ 




- %n - 


J 



This is just a special case of general weighted norms induced by positive definite matrices. 



A 



Example 5.1.3 (p-norms on continuous functions on [0, 1]) Consider the real vector space C ([0, 1]) of 
continuous functions f : [0, 1] — > JR. For 1 < p < oo, the p-norm of f in C ([0, 1]) is given by: 
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and, forp = oo, the oo-norm of f is given by: 



The oo-norm is also known as the uniform norm. A 

Example 5.1.4 (p-norms on continuous functions with compact support) Let f : IR -> IR be a continu- 
ous function. The support of f is: 



support/ = {i£l:/(i)/0} 

that is, the closure of the set of points where f is non-zero. If support / is compact (closed and bounded, in 
this case), then f is said to be compactly supported. For example, the function: 

' 1 ift e (-1,1) 
0 otherwise 



/(*) = 



is compactly supported with support [—1, 1], whereas the Gaussian function: 

fit) = e-* 2 

is supported on IR, and does not have compact support. 

Let C c (IR) be the set of all functions / : IR — » IR with compact support ( note that different functions in 
C c (IR) may have different supports). It is easily shown that C c (IR) is a vector space. For 1 < p < oo, the 
p-norm of f in C c (IR) is given by: 

\\f\\ P = (l M \m\ p dt) 1/P 

and, forp = oo, the oo-norm of f is given by: 



oo 



sup |/(t)| = max \f(t) | 



The oo-norm is also known as the uniform norm. A 
Definition 5.1.2 (Normed linear space) A normed linear space (V, ||-||) is a vector space V equiped with 



a norm 



We have seen some examples of normed linear spaces already. It is very easy to construct more examples 
using the procedures discussed next. Suppose that (V, ||-||y) is a normed linear space and W is a vector 
space isomorphic to V. Then, ||-||y induces a norm on If in a natural way. This is the subject of the 
following proposition. 

Proposition 5.1.1 (Norms induced by isomorphisms) Let (V, ||-|| y ) be a normed linear space and W be 
a vector space. Suppose that F : W — » V is an isomorphism between W and V. Then, \\-\\ w defined by: 

\\w\\ w = \\F(w)\\ v forallw^W 

is a norm on W. ■ 
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Example 5.1.5 (p-norms or Holder norms on C mXn ) The above proposition can be used to generate 
Holder norms on C mxn . To do this, we identify C mxn with C mn with the isomorphism F: 



P=[P1 P2 ■■■ Pn}^ 



Pi 
P2 



Pn 

where pi is the ith column of p, i.e, convert a matrix to a vector by stacking columns one below the other 
(called column-major format in computer programming). Then, apply the Holder norms for vectors defined 
earlier. Thus, 

" ~P\ 



\P\\ k = 



P2 



PnJ k 



for k > 1. 



A 



Recall the definition of direct sum of subspaces given in Chapter 2 on Page 19. The next result shows how 
to make direct sum of normed linear spaces into a normed linear space. 

Proposition 5. 1.2 (Norms on direct sums of normed linear spaces) Let ( ?7, 1 1 ■ 1 1 v ) and ( V, 1 1 ■ 1 1 v ) be 

normed linear spaces where U and V are subspaces of a vector space W. Define 



as follows: 



for 1 < p < oo and 



\UVjp 



UV,p 



UV,oo 



UeV^JR 



WWl + \\V\\ P V ) 1/P 



max {IMIt/ 5 ll^lly} 



forp = oo. Then, (if ® V, \\ m \\uvp) ™ a normed linear space. 



Example 5.1.6 Let(IR n , ||-|| 2 ) be the real vector space of column vectors of size n with the Euclidean norm 
and (C ([0, 1]), ||-|| 2 ) be the real vector space of continuous functions on [0, 1] with the 2-norm. The direct 
sumofJR n andC{% I]): 



ET©C ([0,1]) 





X 


{ 





: x E JR n J EC ([0, 1]) 



5.2. INDUCED METRIC, BALLS AND SEQUENCES 



71 



is composed of elements whose first component is a vector and the second component is a function. 
With p = oc, we have 



max i \\x\ 



2i \\J 112 



ET ffiC([0,l]),oo [\k=i 
as anormonJRJ 1 ®C([0 7 1]). A 

We conclude this section with another way to generate normed linear spaces. 

Proposition 5.1.3 (Norms induced by restrictions) Let (V, ||-||) be a normed linear space and U be a sub- 
space ofV. Define 

\\ x \\u = W X W far all x 6 U 
Then, is called the restriction of\\-\\ toU and (U, \\ ■ \\ y) is a normed linear space. ■ 

5.2 Induced metric, balls and sequences 

A norm measures the size of elements whereas a metric measures the distance between elements. These are 
different concepts in general. A norm always induces a natural metric but a metric may not induce a norm. 
A metric is needed to define important concepts regarding sequences and functions. These will lead us to 
Banach spaces in the next section. 

Definition 5.2.1 (Metric and metric space) Let S be a set. A metric don S is a function d : S x S — > IR 
with the following properties: 

1. d(x, y) > Ofar all x, y in S 

2. d(x, y) = 0 if and only if x = y 

3. d(x,y) = d(y,x) for all x,y in S 

4. d(x, z) < d(x, y) + d(y, z) for all x, y, z in S 

A metric space (5, d) is a set equiped with a metric. ~k 

Compare this definition with that of norm given on Page 67 and observe the similarities. The main differ- 
ences are that (i) a norm is a function of a single variable, whereas a metric is a function of two variables, 
and (ii) the definition of a norm involves vector spaces, whereas metrics can be defined on any set. The 
following example illustrates the second difference. 
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Example 5.2.1 Let S = {1, 3}. Define d : S x S -> IR as follows: 

[ 0 otherwise 

d is a metric on S, but S is not a vector space. A 

Definition 5.2.2 (Induced metric) Let (V, ||-||) be a normed linear space. The metric d induced by the 
norm \\-\\ is defined as: 

d{x,y) = \\x - y\\ 

forallx,yinV. -k 

Accordingly, every normed linear space is a metric space. The converse is not true as the following example 
shows. 

Example 5.2.2 Consider the vector space IR and its subset S = [1,2]. Define: 

d(x,y) = \x-y\ 

for all x,y in S. This is the standard metric on S. But, S is not a vector space since 0^5. A 

Rather than using a new notation d to denote an induced metric we will use the notation for norm. 

Definition 5.2.3 (Open and closed balls) Let (V, ||-||) be a normed linear space. An open ball of radius r 
centered at x E V is the set 

{y E V : \\x — y\\ < r} 

i.e, the set of all points that are at a distance strictly less than r from x. When the inequality is non-strict, 
the set is called a closed ball. ~k 

Example 5.2.3 (Ball in IR 3 with Euclidean norm) The terminology "ball" originates from the fact that 
the definition results in a sphere in IR 3 with the Euclidean norm. A 

Example 5.2.4 (Unit ball in C ([0, 1]) with uniform norm) The uniform norm on the real vector space of 
continuous functions on [0, 1] is given by: 

tt [U 5 1J 

So, the unit ball in C ([0, 1]) centered at sin(i) in this norm is: 

If EC ([0,1]): max \f(t) - B in(t)|<ll = {f E C ([0, 1]) : \ f(t) — s'm(t)\ < 1, Vie [0,1]} 

= {/eC([0,l]):-K/(t)-Bin(t)<l, Vie [0,1]} 

\ feC ([0, 1]) : -1 + sin(t) < f(t) < 1 + sin(t) j 
\ for all t e [0, 1] J 

where we used the fact that sin(i) > Ofor all t 6 [0, 1] to derive the last equality. A 
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We now give two important applications of norms, namely convergence of sequences and continuity of 
functions. Earlier in this chapter and in the previous chapter, we defined spaces of continuous functions 
without clarifying what is meant by continuity. 

Definition 5.2.4 (Boundedness and norm-convergence of a sequence) Let (V, ||-||) be a normed linear 
space and {xk}^Li be a sequence in V. 

1. The sequence is bounded if and only if there exists M < oo such that 

\\x k \\ < M 

for all k. 

2. The sequence is said to converge in norm to x E V if and only if 

\\x — Xk\\ — > 0 

as k tends to oo. 

When a sequence converges to a point x E V, we say that x is the limit of the sequence. -k 

Example 5.2.5 (A bounded non-converging sequence) The sequence {0, 1, 0, 1, • • •} is a bounded non- 
converging sequence on the vector space IR equiped with the standard norm. A 

Definition 5.2.5 (Continuity of functions) Let (V, \\-\\ v ) and (W, \\-\\ w ) be normed linear spaces. Let S 
be an open subset ofV. 

1. A function f : S — > W is continuous at a point xq E S if and only if for each e > 0, there exists 
5 > 0 such that 

whenever \\x — xq\\ v < 8 we have \\f(x) — f(xo) \\ w < e 

2. A function f : S — » W is continuous in S if and only if it is continuous at every point in S. i< 

Example 5.2.6 (Numbers associated with matrices) Determinant and trace were not viewed as functions 
in Chapter 3, but as numbers that can be attached with a square matrix. Here, we ask how these numbers 
vary as the matrices change. 

Let (C nxn ,||-|| 2 ) and(€,\\-\\ 2 )be respectively normed linear spaces ofnxn matrices and complex numbers 
with the 2-norm. Define the following functions from (D nXn into C: 

f(M) = determinant of M 

g(M) = trace of M 

It can be shown that f and g are continuous functions. This means that small changes in the matrix will 
only cause small changes in determinants and traces (although we dont know how small). A 
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5.3 Banach spaces 

This section introduces the notion of a Banach space which is very important in engineering and science. 
For example, a lot of the recent work in robust control are in the Banach spaces 'H 00 (ID), L\ (H), etc. We 
shall define these spaces along with other examples of Banach spaces. 

The following definition is very important and lies at the heart of what we think convergence of a sequence 
ought to be. 

Definition 5.3.1 (Cauchy sequence) Let (V, ||-||) be a normed linear space and {xi c } k x L 1 be a sequence in 
V. The sequence is called a Cauchy sequence if and only if, for each e > 0, there exists a positive integer 
N such that 

\\Xm •''nil ^ 

for allm > N and n > N. 

Suppose that {xk} k x L 1 is a Cauchy sequence. Then, as we take larger and larger values of k, the distance 
between elements of the sequence do not become larger. More precisely, given any number e > 0, however 
small it may be, we can find an index N such that all elements of the form x^+k, k > 0, are within a 
distance of e of each other. 

Example 5.3.1 The following sequence of matrices: 

l/k e- k 
0 1 j,*=i 

is a Cauchy sequence in IR 2x2 with any of the p-norms. On the other hand, the following sequence of 
matrices: 

l/k sin(A;) 
0 1 AJk=l 

is not a cauchy sequence. A 

Both sequences in the above example are bounded. This shows that bounded sequences are not necessarily 
Cauchy sequences. But, the following proposition says that Cauchy sequences are bounded. 

Proposition 5.3.1 (Cauchy sequences are bounded) Let (V, ||-||) be a normed linear space and {xk}^ =1 
be a Cauchy sequence in V. Then, {xk}fL 1 is bounded. ■ 



This proposition can be easily seen as follows. Given an index (a strictly positive integer) N, the sub- 
sequence {xn+i, xn+2, • • •} is called the tail of the sequence {xi,X2, ■ ■ ■} starting at N. The head of the 
sequence ending at N is {x±,X2, ■ ■ -,xn}. Note that head of the sequence has only N elements, whereas 
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the tail is an infinite sequence. Suppose that {xk}kLi is a Cauchy sequence. Pick any e > 0. Then, by 
definition, there is a strictly positive integer N such that the elements of the tail of the sequence starting at 
N are within a distance of e of each other. Put differently, if we take a ball of radius 2e centered at xn, then 
it will contain the tail starting at N. Now, the head of the sequence ending at 7Y has only a finite number 
of elements and so, there is one, say xi, that has the largest norm. It is easy to verify that the sequence is 
contained in the ball of radius \\xi\\ + 2e centered at 0. 

To define the next concept, let us consider the set of rational numbers. These are all the real numbers that 
can be written as the ratio of two integers: 



For the moment as we are presently interested in sequences, we overlook the fact that Q is not a vector 
space over IR. On Q, we can define the metric induced by the absolute value. That is, the distance between 
two numbers is the absolute value of the difference between them. So, Q with the absolute value metric is a 
metric space. 

Let us consider the sequence of rational numbers: 

{1, 1.4, 1.41, 1.414, 1.4142, 1.41421, • • •} 

This is a Cauchy sequence, and the elements get closer and closer as we go towards the tail-end. But, it does 
not converge to any number in Q (it converges in IR to the number \pl which is not rational). This example 
shows that Cauchy sequences in metric spaces may not converge even though our initial feeling about the 
elements getting closer and closer would suggest otherwise. On the other hand, when viewed as a sequence 
in IR, the above sequence is still a Cauchy sequence and converges to a real number. 

Definition 5.3.2 (Complete metric space) A metric space is complete if every Cauchy sequence converges 
to a point in the metric space. ~k 

Thus, in a complete metric space, every Cauchy sequence converges to something in that space. When a 
metric space is not complete, it may be possible to add on limits of Cauchy sequences and obtain a complete 
metric space. This process is called completion. 

Example 5.3.2 (Metric spaces of rational numbers and real numbers) Let Q be the metric space of ra- 
tional numbers with absolute value metric. Q is not complete. Its completion ( with respect to absolute value 
metric) is IR with absolute value metric. IR with absolute value metric is a complete metric space. A 

Example 5.3.3 (Metric space that is not complete) Consider the normed linear space ofC ([0, 1]) with 2- 
norm. We have seen earlier that normed linear spaces are metric spaces. So, C ([0, 1]) with 2-norm is a 
metric space. It is not complete as there are Cauchy sequences of continuous functions that do not converge 
to a continuous function. 



Q = {x e IR : x 



n_ 

rn 



where n, m are integers and m/0) 
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Definition 5.3.3 (Banach Space) A complete normed linear space is called a Banach space. 



Banach spaces are very important and appear everywhere. We shall now list some important examples and 
results. For clarity, the rest of this section is divided into finite dimensional examples, spaces of sequences, 
Lebesgue spaces and Hardy spaces. It should be noted that many other equally important examples (such 
as Sobolev spaces) are not described. As a reminder, all vector spaces considered in this chapter are defined 
over either IR or C. 

5.3.1 Finite dimensional spaces 

In general, to prove that a certain vector space is Banach, we need to show completeness among other things. 
This is not required if the vector space is finite dimensional as the following theorem states. Recall that a 
vector space is said to be finite dimensional if and only if it has a basis containing only a finite number of 
elements. 

Theorem 5.3.1 A finite dimensional normed linear space is a Banach space. ■ 
Thus, all the finite dimensional normed linear spaces given previously are Banach spaces. 

5.3.2 Spaces of sequences 

Let J? + denote the set of positive integers and S(Z+) denote the set of all (one-sided) sequences of real 
numbers: 



We can render this set a vector space by introducing point-wise addition and scalar multiplication. 
Definition 5.3.4 (Space of sequences) For 1 < p < oo, define the space of sequences l p (Z+) as: 



S{Z+) = {(x 1 ,x 2 ,x 3 , ■ ■ ■) : x k G IR} 




and for p = oo, define the space of sequences l 00 (Z+) as: 



loo(Z+) = < [xk]^ € S : sup \x k \ < oo 
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Definition 5.3.5 (p-norms on spaces of sequences) For 1 < p < oo, define the p-norm of x = [xk\kLi £ 
l p (Z+) as: 



ii ii -fvi 

ll^llp — I / j y^k\ I 



and for p = oo, define the oo-norm of x €E / CJO (Z+) as: 



\XWoo = SUp \X k \ 
k=l,2,- 



These norms are knowns as Ip-norms. 



Theorem 5.3.2 (l p spaces are Banach spaces) For 1 < p < oo, l p space with the associated Ip-norm is a 
Banach space. ■ 

5.3.3 Lebesgue spaces 

Lebesgue spaces are usually defined by first introducing Lebesgue measure. As measure-theoretic consid- 
erations will take us too far away from the main theme, we state the definition of Lebesgue spaces and 
introduce associated norms. 

Definition 5.3.6 (Lebesgue spaces of functions on IR) For 1 < p < oo, the Lebesgue space C p (IR) is 
given by: 

C p (IR) = j/ : IR — > IR : / is Lebesgue measurable and \f(t)fdt < oo j 
and for p = oo, the Lebesgue space (IR) is given by: 

Coo (IR) — { f '■ IR - > IR : / is Lebesgue measurable and ess sup |/(£)| < oo > 
I telR J 

where the integral is the Lebesgue integral, and ess sup stands for essential supremum. ~k 

Definition 5.3.7 (p-norms on Lebesgue spaces) For 1 < p < oo, the p-norm of f in the Lebesgue space 
C p (IR) is: 



i/p 

p dt' 



„=(/ R l/WI 

and for p = oo, the oo-norm of f in C^ (IR) is: 



oo = ess sup 
telR 



These norms are called Lp-norms. 
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Theorem 5.3.3 (Lebesgue spaces are Banach) For 1 < p < oo, C p (IR) with the associated Lp-norm is a 
Banach space. 

The Lebesgue spaces C\ (IR), £2 (IR) and C^ (IR) are very important in engineering. These are known 
respectively as the Lebesgue spaces of integrable functions, square integrable functions and essentially 
bounded functions. C2 (IR) is also known as the space of signals of finite energy. C^ (IR) is also known as 
the space of persistently exciting signals. 

We now examine C v (IR) for p < 00 as completions of certain nice spaces. Consider the real vector space 
C c (IR) of compactly supported continuous functions / : IR — > IR defined in (5.1.4). Earlier, we turned 
C c (IR) into a normed linear space by defining p-norms (or Holder norms): 

i/p 

p — I, ■ L| I V I I dt 



= (/b 

for 1 < p < 00 and 



00 = sup \ f(t)\ 
telR 

for p = 00. These are precisely the Lebesgue norms because, for continuous functions, Lebesgue integrals 
become Riemann integrals and essential supremum becomes supremum. 

Theorem 5.3.4 (Completion of C c (IR) inp-norms) For 1 < p < 00, the completion of C c (IR) in the p- 
norm is £ p (IR). ■ 

Coo (IR) is the odd space as it is not the completion of C c (IR) in the oo-norm (uniform norm). In fact, the 
completion is a proper subspace of C^ (IR). 

5.3.4 Hardy spaces 

The open unit disc: 

ID = {z e C : \z\ < 1} 

and the open right half plane: 

3? = {z e C : real part of z > 0} 

are the most common subsets of C arising in engineering applications. They define the region of analyticity 
of transfer functions of stable systems in discrete-time and continuous-time respectively (some engineers 
prefer the complement of ID as the region of analyticity). We shall define Hardy spaces of functions on the 
disc, but the definition can be adapted to the right half plane and, indeed, to any open subset of C. 

Definition 5.3.8 (Hardy spaces of the unit disc) For 1 < p < 00, the Hardy space H p (ID) is defined as: 
H p (D) = (/ : C -> £ analytic in ID and sup {** \f{re ie )\ p d9 < 00} 
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and for p = oo, the Hardy space Tioo (ID) is defined as: 



Hoc (ID) = I / : C— > C analytic in ID and sup sup \f{re l ®)\ < oo 

[ 0<r<l Z :\z\=r 



Definition 5.3.9 (p-norms on Hardy spaces) For 1 < p < oo, the p-norm on Hardy space H p (ID) is 
defined as: 

11/11,= - sup / \f(re»)Pde 
and for p = oo, the oo-norm on 'H 00 (ID) is given by: 



ess sup |/(e J 

0<6<2tt 



These norms are known as Hp-norms. 



Theorem 5.3.5 (Hardy spaces are Banach) For 1 < p < oo, H p (ID) with the associated Hp-norm is a 
Banach space. ■ 

The spaces %i (ID), V.2 (ID) and 'H 00 (ID) are the most important in discrete-time engineering problems. 
We can define Hardy spaces of analytic functions of the open right half plane analogously for continuous- 
time problems. While the definition of iJp-norms appear formidable, great simplifications occur when we 
look at rational functions. In this case, these norms can be calculated from Bode magnitude plot. Recall that 
the Bode magnitude plot of a function / of the complex variable is the plot of 

frequency ui vs |/(jw)| 

If, instead of the magnitude, we plot 

frequency ui vs |/(jw)| 2 

we get the power spectral plot of /. 

Proposition 5.3.2 (Bode plots and Hp norms) Let f be a rational function analytic in the open right half 
plane. Let denote the open right half plane. The following statements are true. 

1. f is in the Hardy space 'Hi (-ft) if and only if the area under the Bode magnitude plot of f is finite. 

2. f is in the Hardy space %2 (3^) if and only if the area under the power spectral plot of f is finite. 

3. f is in the Hardy space Hoo (3?) if and only if the maximum magnitude in the Bode plot is finite. ■ 
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Chapter 6 

Inner Product Spaces and Hilbert Spaces 

6.1 Inner products 

The dot product of two vectors is the product of their lengths and the cosine of the angle between the vectors. 
We have a good mental picture of angle between vectors in R 2 . The same cannot be said about the angle 
between functions in infinite dimensional spaces. Inner products generalize the concept of dot product and 
through it, we can give precise definition of angle between elements of any vector space. 

Definition 6.1.1 (Inner product, inner product space) Let V be a vector space over C. A mapping (•, •) : 

V x V -> C satisfying: 

1. Positivity: (x, x) > 0 for all x 6 V. 

2. Positive definite-ness: (x, x) = 0 if and only if x = 0. 

3. Linearity in the first argument: 

{ax + fiy, z) = a{x, z) + P{y, z) 
for all x, y, z in V and scalars a, p. 

4. Hermitian-ness: (x, y) = (y, x) for for all x , y in V. 

is called an inner product on V. 

V together with {-, •) is called an inner product space. i< 

Inner product is a mapping from VxV into the underlying field. When the underlying field is (D, properties 3 
and 4 imply that the inner product is conjugate linear in the second argument, that is, 

(x, ay + Pz) = {ay + Pz, x) 
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= a(y, x) + fi(z, x) 
= a{y, x) + @{z, x) 
= a{x, y) + /3{x, z) 

so that we get the conjugates of a and /3 instead of a and /3. Now, for a real vector space V, an inner product 
maps V x V into JR. In this case, the inner product is linear in both arguments and the Hermitian-ness in 
property 4 becomes symmetry. 

Example 6.1.1 (Standard inner product on IR n , C n ) Consider the vector space IR n . Define the following 
map on IR n x IR n into JR: 

(x, y)j^n = y T x 

for all x, y in JR n . It is easy to verify that this is an inner product on JR n . It is the standard inner product on 
JR n . Note that the definition yields: 

(x, x)j^n = X T X = \\x\\2 

that is, the inner product of x and x is the square of the Euclidean norm of x. 
The standard inner product on C n is given by: 

(x, y)c n = y*x 

for all x, y in C n . We again get the square of the Euclidean norm as the inner product of x and x. A 

Example 6.1.2 (Standard inner product on <D mXn ) Define the following map on C mXn x C mXn into €: 

(A, B) C mxn =Tr (AB*) 

for all A, B in C mXn . Again, it is easy to verify that this is an inner product. It is the standard inner product 
on C mxn . Note that: 

m n 

(A, A) cm xn = Tr (AA*) = £ £ \ aij | 2 = ||^4||1 

i=i 3=1 

that is, the inner product of A and A is the square of the Holder norm \\A\\ 2 - A 

Example 6.1.3 (Standard inner product on £2 (H)) Recall the definition of the Lebesgue space £2 (H) 
from Chapter 5. Define the following map: 

</> 9) £2im) =J R f(t)9(t)dt 
for all /, g in £2 (H). This is the standard inner product on £2 (H). As before, we have: 

</> /)£ 2 (1R)=/ R /W 2 ^=ll/Il2 
which shows that the inner product of f and f is the square of the £2 norm of f. A 
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The above examples allude to an important property. An inner product on a vector space induces a norm. 
Thus, every inner product space is also a normed linear space with the induced norm. Recall that a norm 
induces a natural metric and that a normed linear space is a metric space. We summarize these below. 

Proposition 6.1.1 (Norm induced by an inner product) Let V be an inner product space with inner prod- 
uct {-, •). Define ||-|| : V — > [0, oo) as follows: 

\\x\\ = «x, x)) 1 ' 2 (6.1) 

for all x in V. Then, || • || is a norm on V. ■ 

The norm defined in the proposition is called the norm induced by the inner product. With this norm, V 
is a normed linear space. The difficult part in proving the proposition is showing triangle inequality. It is 
an important fact that the definition of an inner product is sufficient for triangle inequality to hold. Inner 
products also imply other inequalities some of which are summarized below. 

Proposition 6.1.2 (Properties of inner products) Let (V, (•,■)) be an inner product space. As in Proposi- 
tion 6.1.1, let \\-\\ be the induced norm. The following statements are true. 

1. Cauchy-Schwarz inequality holds: 

\{x, y)\ < \\x\\ ■ \\y\\ 

for all x,y in V. 

2. Polarization identity holds: 

(x,y) = \ (x + y, x + y) - \ {x-y,x-y) 

for all x,y in V. 

3. Triangle inequality holds: 

\\x + y\\ < \\x\\ + ||y|| 

for all x,y in V. ■ 

We conclude this section with a few more examples of inner products. 



Example 6.1.4 (Weighted inner products on IR n ) Let A e M m n be a matrix with rank n. Define the 
following map: 

(x, y) A = ( Ax i A y)M m = y T { aTa ) x 

for all x, y in IR n . The map so defined is an inner product on JR n . It induces a weighted 2-norm on IR n . A 
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Example 6.1.5 (A norm that is not induced by an inner product) Consider the vector space C ([0, 1]) of 
continuous functions on JR with the uniform-norm: 

ll/lloo = max 

This is a normed linear space. But, the uniform-norm is not induced by an inner product. A 



6.2 Hilbert spaces 

As we have seen, every inner product space can be made into a normed linear space using the norm in- 
duced by the inner product. This permits us to consider sequences, convergence, Cauchy sequences and 
completeness on inner product spaces as was done in Chapter 5. The culmination of all those concepts is 
the following. 

Definition 6.2.1 (Hilbert space) A complete inner product space is called a Hilbert space. ~k 

Recall that a complete normed linear space is called a Banach space. Since every inner product space is also 
a normed linear space, Hilbert spaces are Banach spaces. Some examples of Hilbert spaces are given next. 

Example 6.2.1 (finite dimensional examples) IR n with the standard inner product: 

{x, y) = y T x 

is a Hilbert space. The norm induced by this inner product is the Euclidean norm. Other finite dimensional 
examples include C n , C mXn with their respective standard inner products. A 

Example 6.2.2 (Sequence space l2(Z+)) The real one-sided sequence space: 

l2iZ + ) = {[x k ]% =1 :jt4<oo) 
I k=l J 

with the inner product: 

oo 

[yk]T=i) = J2 x *y* 

k=l 

is a Hilbert space. A 

Example 6.2.3 (Lebesgue space of finite energy signals £2 (IR)) The Lebesgue space of square inte- 
grable functions on IR with the inner product: 

(/, 9) = J M f(t)g(t)dt 

is a Hilbert space. A 
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Example 6.2.4 (Hardy space of square integrable analytic functions V.2 (ID)) The Hardy space of func- 
tions that are analytic in the unit disc and square integrable on the unit circle with the inner product: 

{/,5> = ^/ o f(e ie )g(e^) d e 
is a Hilbert space. A 

A vector space isomorphism between a normed linear space V and a vector space W induces a norm on W 
turning it into a normed linear space. An analogous result for Hilbert spaces is the following. 

Proposition 6.2.1 (Inner product induced by vector space isomorphism) Let V be an inner product 
space with inner product (•, -) v . Let W be a vector space and F : V — >■ W be a vector space isomor- 
phism (that is, a linear one-to-one onto map). Define the following map onW X W : 

(x, y) w = (F- 1 (x),F- 1 (y)) v 

for all x,y in W. Then, {-, -) w is an inner product on W. ■ 

It is important to note that the value of the V^-inner product of x and y is equal to the value of the F-inner 
product of the pre-images of x and y. This suggests the following definitions. 

Definition 6.2.2 (Inner product isomorphism) Let V and W be inner product spaces with inner products 
{-, -) v and (•, -) w respectively. A map F :V — » W is an inner product isomorphism if and only if 

1. F is a vector space isomorphism, and 

2- (x,y) w = (F- 1 (x),F- 1 (y)) v forallx,yinW * 

Definition 6.2.3 (Isomorphic inner product spaces) Two inner product spaces are isomorphic if and only 
if there is an inner product isomorphism between them. "k 



6.3 Orthogonality 

We shall now introduce the concept of orthogonality which is the key to exploiting the rich structure of 
Hilbert spaces. 

Definition 6.3.1 (Orthogonal vectors/sets, orthonormality) Let V be a Hilbert space with inner product 
{-, •) and induced norm \\-\\. 
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1. Let x and y be elements ofV. x and y are said to be orthogonal to each other if and only if their 
inner product is zero, i.e. {x, y) = 0. 

2. Let x be an element ofV and S be a subset ofV. x is said to be orthogonal to the set S if and only if 
x is orthogonal to every element in S, i.e, (x, y) = Ofor all y 6 S. 

3. Let S be a subset ofV. S is said to be an orthogonal set if and only if each x 6 S is orthogonal to 
the set S \ {x}, i.e, (x, y) = Ofor all x,y € S, x ^ y. 

4. A vector x in V is said to be normal if and only if \\x\\ = 1. 

5. A set S is orthonormal if and only if it is orthogonal and every element in it is normal. -k 

These definitions generalize the notion of a vector being perpendicular to another or a subset. It is important 
to recognize that orthogonality depends upon the inner product being used. Elements that are orthogonal to 
each other in some inner product may not be so in another inner product. Some examples will clarify these 
points. 

Example 6.3.1 (Orthogonal vectors in IR n ) Consider TR n as a Hilbert space with the inner product: 

(x, y) = y T x 

which induces the Euclidean norm. Recall the standard basis introduced in Chapter 2: 



B= {ei,e 2 ,- • -,e„} = < 



It is easily verified that 



(ej, €.j) — ej &{ 
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So, ej is orthogonal to ej whenever i ^ j. Every element of this basis is normal. Finally, this basis is an 
orthonormal set. 

Now, suppose that we change the inner product from the standard inner product to the following weighted 
inner product: 

(x, y) w = y T Wx 

where W > 0 is a given non-diagonal matrix. IR n with this inner product is a Hilbert space. But, the set B, 
which is still a basis for IR n , is not necessairly orthonormal. This is because 



which is not necessairly 0 or 1. 



A 
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Example 6.3.2 (An orthogonal set in £2 ([0, 1])) Recall the Lebesgue space £2 ([0,1]) of square inte- 
grable functions defined on [0, 1] with inner product: 

(/, 9)= f f(t)g(t)dt 
Jo 

which is an infinite dimensional Hilbert space. Consider the following collection of functions in £2 ([0, 1]): 

{008(2***)}"! 

We have: 

(cos(27rAi), cos(2-7rZi)) = / cos(2-7rA;i) cos(2irlt)dt 

Jo 

1 f 1 

= - (cos (2tt(A; -l)t)+ cos (2ir(k + l)t)) dt 

2 Vo 

1 J Jq (1 + cos (iirkt)) dt ifk = I 

2 \ Jq 1 (cos (2n(k - l)t) + cos (2Tr(k + l)t)) dt otherwise 

1 J 1 ifk = I 

2 [ 0 otherwise 

which shows that the set is orthogonal. A 



Orthogonal sets have many nice properties. For example, if is an orthonormal set in the Hilbert 

space TR n with standard inner product, then the matrix 

X = [Xl X 2 • • • X n ] 

is an orthonormal matrix (i.e., X T X = XX T = I). This is because the elements of the matrix X T X are 
inner products of Xi and Xj which are either zero or one. The matrix X T X is an example of a Gram matrix 
which we consider below. A further observation is that orthogonal sets are linearly independent. 



Definition 6.3.2 (Gram matrix) Let V be a Hilbert space with inner product (•, •) and X = {xk}™ = i be a 
collection of elements in V. The Gram matrix associated with the collection of elements is: 

G(X) = [{ Xi , Xj )]™ =1 

and has size m x m. i< 



Proposition 6.3.1 (Testing for orthogonality) Let V be a Hilbert space with inner product (•, •) and 
{xk}^ =1 be a collection of elements in V. The following statements are true. 

1. {xk}^ = i is an orthogonal set if and only if the Gram matrix G(Y) associated with any finite sub- 
collection Y = {yi}^i of{xk}^L 1 is diagonal and invertible. 



88 



CHAPTER 6. INNER PRODUCT SPACES AND HILBERT SPACES 



2. {x^'kLi is an orthonormal set if and only if the Gram matrix G(Y) associated with any finite sub- 
collection Y = {yj}™ ! of{xi c }^L 1 is the identity matrix. ■ 

As a consequence, we have the following Pythagorus theorem. 

Proposition 6.3.2 (Pythogorus theorem) Let {xk}k=i be an orthogonal set in the Hilbert space V. Then, 

\\xi + X 2 + X 3 + • • • + X n \\ 2 = \\xi\\ 2 + ||X 2 || 2 + \\X3\\ 2 + • • • + \\x n \\ 2 

that is, the square of the norm of a sum of orthogonal vectors is the sum of the squares of the norms of the 
vectors. ■ 



6.4 Projection theorem 

Many engineering problems are solved using the projection theorem. It is one of the most fundamental 
results in the Hilbert space theory. We state the theorem in two different ways and present some applications 
in this section. 

Definition 6.4.1 (Orthogonal complement) Let V be a Hilbert space with inner product (•, •) and S be a 

non-empty subset ofV. The orthogonal complement of S, denoted by S^-, is the set of all elements ofV that 
are orthogonal to S, that is, 

S 1 - = { x € V : {x, y) = Ofor all y <E S} 
Put another way, x is in the orthogonal complement of S if and only if x is orthogonal to S. "k 



Example 6.4.1 Consider IR 2 with the standard inner product. 

1. Let S = {0}. We claim that S 1 - = IR 2 . This is easily seen as follows: 

S 1 - = {x e IR 2 : {x, y) = Ofor all y e s\ 
= {x e IR 2 : (x, 0) = 0} = IR 2 



2. Now, suppose that 



Then, 



S = 



{x e IR 2 : (x, y) = Ofor all j/eS) 



f 


Xl 


<E IR 2 




x 2 _ 




r 


Xl 


<E IR 2 




X 2 _ 





X\ 




..} 


_x 2 _ 





e IR : xi + x 2 = 0 } = Span 



(['.]) 
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In these examples, eventhough S is not a subspace, S is. It should also be noted that S and the span of S 
have the same orthogonal complement. A 

Proposition 6.4.1 (Property of orthogonal complement) Let V be a Hilbert space with inner product 
{-, ■) and S be a non-empty subset ofV. The following statements are true. 

1. S 1 - is a closed subspace ofV. 

2. S 1 - is the orthogonal complement of the span of S. ■ 



It should be noted that a subspace of a Hilbert space is not necessarily closed. But, finite dimensional 
subspaces are closed. The first statement of the above proposition states that orthogonal complement of a 
set is a closed subset, always. According to the second statement, the orthogonal complement of a set is also 
the orthogonal complement of the span of the set. We are now ready to state the projection theorem. 

Theorem 6.4.1 (Projection theorem - Abstract version) Let V be a Hilbert space with inner product 
(•, •) and S be a closed subspace ofV. Then, 

V = S®S ± 

= : there exist unique x € S and y € S 1 " such that z = x + y j 
that is, the Hilbert space is the direct orthogonal sum of S and its orthogonal complement. ■ 



This is perhaps the most important theorem in engineering applications. According to the theorem, given 
a closed subspace of a Hilbert space, we can decompose the Hilbert space into two components that are 
orthogonal to each other. 



Example 6.4.2 (Application to IR 2 ) Consider the Hilbert space IR 2 with the standard inner product (•, •). 
Let S be given by: 

S = Span 



(CD 



Then, 



and by the projection theorem 



S 1 - = Span ( 



1 



R 2 = Span 



)Span 



This means that every x € IR can be written as the unique sum of a vector y in S and a vector z in S- 1 . 
Moreover, y and z are orthogonal to each other. A 
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We shall now state a more concrete version of the projection theorem. For this, the concept of an orthogonal 
projection is needed. 

Definition 6.4.2 (Orthogonal projection) Let V be a Hilbert space and S be a closed subspace of V. 
According to the projection theorem, each x in V can be written uniquely as: 



x = y + z 



where y E S and z E S- 1 . 



1. The component of x in S, namely y in the above decomposition, is called the orthogonal projection of 
x onto S. 

2. The component of x in S- 1 , namely z in the above decomposition, is called the orthogonal projection 
of x onto S L . 

3. The map Vs : V — > V that takes x to y is called the orthogonal projection onto S. 

4. The map Vs± :V^V that takes x to z is called the orthogonal projection onto S- 1 . 



Example 6.4.3 (Example 6.4.2 continued) Note that any x E IR can be written as: 



Xl 
X2 



(xi + x 2 ) 



+ \ {xi- x 2 ) 



The first term on the right hand side is the orthogonal projection of x onto S, and the second term is the 
orthogonal projection of x onto 5" 1 . 



Now, to compute the orthogonal projector Vs, note that 



xi 

X2 



-4 j (xi + x 2 ) 



under the projection operator. We can write this as: 

1 



(xi + x 2 ) 



T 


l 


"l 


r 




Xl 


l 


- 2 


l 


i 




x 2 



so that 



Similarly, we can compute V s ± . 



V s = 



1 1 
1 1 



A 



Theorem 6.4.2 (Projection theorem - Concrete version) Let V be a Hilbert space with inner product 
(•, •), S be a closed subspace ofV and x EV. Consider the minimization problem: 

inf \\x — y\\ 

yes 

that is, the problem of finding y E S that is closest to x. The following statements are true. 
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1. The infimal value of the optimization problem is given by \\x — Vs(x) \\. 

2. There exists a unique y in S that achieves the infimal value. This unique solution is given by y = 
V s (x). 

3. The unique solution Vs(x) is orthogonal to the error x — Vs(x), that is, 

(Vs(x) 7 x-V s (x)} =0 

Here, Vs(x) is the orthogonal projection of x onto S. ■ 
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Chapter 7 



Equilibrium Point and Linearization 



Nonlinear systems are very difficult to analyze and there are very few useful truly nonlinear techniques. 
Fortunately, the interest in many applications is the analysis of systems about specific operating points. 
It is possible sometimes to approximate the nonlinear system in the vicinity of an operating point with a 
linear system and, more importantly, deduce properties of the nonlinear system from those of its linear 
approximation. The approximation process is called linearization and is the subject of this chapter. We may 
also be interested in transitioning the system from an operating point to another. Linearization is still useful, 
but it leads to more complicated time-varying linear systems. 



7.1 Equilibrium point 

7.1.1 Systems without inputs and outputs 

Consider the nonlinear time-invariant continuous-time system: 

x = f(x) (7.1) 

where x(t) E IR n and / : IR n — > IR n is a smooth function (meaning infinitely differentiable). The discrete- 
time analog is: 

x(k + l)=f(x(k)) (7.2) 
where x(k) E IR n and, as before, / is a smooth function from IR n into IR™. 

Definition 7.1.1 (Equilibrium point) xq E IR n is an equilibrium point of the continuous-time system (7.1) 
if and only if f(xo) = 0. xq E JR n is an equilibrium point of the discrete-time system (7.2) if and only if 
f(x 0 ) = x 0 . * 

Equilibrium points are also known as zeros of f (in the continuous-time case) and fixed points. If xq is an 
equilibrium point of the continuous-time system (7.1), then the differential equation (7.1) has a solution for 
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all time t > 0 starting at the initial condition x(t = 0) = xq. Indeed, one such solution is x(t) = xq for 
all t > 0. We shall refer to this solution as the equilibrium solution. Keep in mind that an equilibrium point 
is a point in IR n , whereas an equilibrium solution is a function of time that happens to be constant. Thus, a 
continuous-time system starting at an equilibrium point stays there for all time. It is easy to show that if a 
continuous-time system enters an equilibrium state at time to, then it stays there for all time t > to. These 
statements hold for discrete-time systems also. 

Example 7.1.1 Consider the system 

x = 0 

where x(t) E JR n (state space). Every point in JR n is an equilibrium point because the right hand side of 
the above equation is zero no matter which point in the state space is considered. On the other hand, the 
system 

x = x 2 + 1, 

where x(t) E IR, has no equilibrium point because the points where the right hand side becomes zero, i.e. 
x 2 + l = 0^x = ±j, are not in its state space. A 

Example 7.1.2 Consider the system 

x = s'm(x) 

where x(t) E IR. To find the equilibrium points, we look for those points in the state space where the right 
hand side is zero. That is, find all the solutions in IR of: 

sin(x) = 0 

The equilibrium points are clearly given by: x = ■ ■ ■ , — 2tt, —it, 0, 7T, 2tt, ■ ■ -. A 

An equilibrium point (of a continuous-time or discrete-time system) is isolated if it has an open neighbor- 
hood that contains no other equilibrium point. Nonlinear systems can have isolated equilibrium points, a 
dense set of equilibrium points, etc (just about anything you can imagine). 

Example 7.1.3 The system in Example 7.1.2 has only isolated equilibrium points; the first system in Exam- 
ple 7.1.1 has a continuum of equilibrium points. A 

7.1.2 Systems with inputs 

Consider the nonlinear time-invariant continuous-time system: 

x = f(x,u) (7.3) 
where x(t) E IR n , u(t) E IR m and / : IR n x IR m -)• IR n is a smooth function. Its discrete-time analog is: 

x(k + l) = f(x(k),u(k)) (7.4) 
where as before k is the discrete-time index. 
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Definition 7.1.2 (Equilibrium point) A pair (xo,uo) € IR n x IR m is an equilibrium point of the 
continuous-time system (7.3) if f(xo, uq) = 0. A pair (xq,uq) € IR n X JR m is an equilibrium point of 
the discrete-time system (7.4) if f(xo, uo) = x$. ~k 

Note that the definition involves a state vector xq and a control input vector uq. This type of equilibrium 
point is also known as a trim point and, in some cases, an operating point. 



Example 7.1.4 Consider 



x = x 2 + u 



where x(t) and u(t) are real numbers. To find the trim points, we solve for all real solutions of: 

x 2 + u = 0 

and obtain the set: 

{(t/u, -v),(-y/v, -v) : v e [0,oo)} 

as the set of all solutions. Every point in the above set is an equilibrium point. The left hand side plot in 
figure 7. 1 shows part of this set. 




0 2 4 6 8 10 12 14 16 18 20 _45Q _400 -350 -300 -250 -200 -150 -100 -50 0 50 



Figure 7.1: Trim point curves for continuous-time system (left) and discrete-time system (right) 

On the other hand, the equilibrium points of the discrete-time system: 

x(k + l) = x(k) 2 + u(k), 
where x(k) and u(k) are real numbers, are given by real solutions of: 

x = x 2 + u 

The right hand side plot in figure 7.1 shows part of the curve defined by the above equation. A 
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7.2 Linearization about an equilibrium point 



Let x be a vector in IR n and denote by Xi the ith component of x, i.e., 



X\ 
X2 



Let / : JR n — > JR n be a smooth function. For each x E IR™, the value of / at x, namely f(x), is a vector in 
IR n which can be written in component form as: 



/(*) = 



fn(x) 



Here,/, is the ith component of /. 



Example 7.2.1 Define f : IR 2 -> IR 2 as follows: 



X2 + sin(x2) 



where x\ and X2 are the components of x. f is a function of two variables (x\, X2) and takes values in R 2 . 
Moreover, 

f 1 (x) = xl + xl, f 2 (x) = x 2 + sin(x 2 ) 



are components of f. 



A 



Definition 7.2.1 (Jacobian) The Jacobian of / at x is the n x n matrix defined as: 



2i 

dx 



(dfi/dx!) (dh/dx 2 ) 

{df 2 /dx 1 ) (df 2 /dx 2 ) 

(df n /dxi) {df n /dx 2 ) 



{dfi/dx n ) 
(df 2 /dx n ) 

(dfn/dXn) 



where (dfi/dxj) is the partial derivative of the ith component of f with respect to the jth component of x. 
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Figure 7.2: Trajectories emanating from initial conditions in the vicinity of an equilibrium point 

The Jacobian of / given in Example 7.2.1 can be computed easily as: 

2xi 2x2 
0 1 + cos^) 

An important point to note is that the Jacobian is a matrix-valued function. It becomes a matrix only when 
it is evaluated at a particular x. We use the notation: 

ftf I 

dx 1x0 

for the Jacobian off evaluated at x = xq. This is a n x n constant matrix. Although our definition applies 
to functions from IR n into JH n , the extension to functions mapping IR n into IR m is clear. 



7.2.1 Systems without inputs 

Consider the continuous-time system given in (7.1). Since / is smooth, it has a Taylor's series expansion 
about xq- 

f(x) = f(x 0 ) + — \ Xo (x-x 0 )+ higher order terms 

which is valid for all x in an open neighborhood of xq. Note that the coefficient of (x — xq) in the Taylor's 
series expansion is the Jacobian of / evaluated at x = xq. Let us denote this matrix by A. If xq is an 
equilibrium point of the continuous-time system, then f(xo) = 0 and 

df 

f(x) « Tj- Uo (x - x 0 ) = A(x - Xq) 

which is accurate up to first order. That is, in the vicinity of an equilibrium point, the (possibly) nonlinear 
differential equation (7.1) can be written as: 

x = A(x — xq) + small terms (7.5) 

Now, let us think in terms of solutions of the differential equation (7.1). Since xq is an equilibrium point by 
assumption, it is an equilibrium solution. Pick an initial condition Xi c that is close to xq. It can be proved 
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using smoothness of / that there is a time interval [0, T) over which the nonlinear system has a solution 
starting from Xi c . Along this solution, the expression: 

x — A(x — xo) ps 0 

holds for as long as the solution stays in the vicinity of xq where (7.5) is valid. In fact, there is a time 
0 < T\ < T such that the approximate condition holds for all time in the interval [0, T\). We can write the 
above expression as: 

^(x - x 0 ) - A(x - x 0 ) & 0 (7.6) 

since xq is a constant. 

Consider the linear time-invariant (LTI) system: 

z — Az 



with initial condition z(0) = Xi c — xq. Define 

X = Xq + Z 

We claim that x is close to x. Indeed, 

x - x = f(x) - ^ (xo + z) = f(x) - z = f(x) - Az = f(x) - A(x - x 0 ) ps 0 

which along with x(0) — x(0) = 0 implies that x and x are close to each other in a sufficiently small time 
interval. In summary, the solution z of the LTI system is such that xo + z is close to the solution of the 
nonlinear system for at least a small time interval. This is true of any initial condition Xi c that is close to the 
equilibrium point xq- The equation z = Az is the linearization of the nonlinear system about the equilibrium 
point xq. Similar argument can be made for the discrete-time system (7.2) to arrive at the linear difference 
equation Zk+i = Azk- 



Definition 7.2.2 (Linearization - no input case) The linearization of a continuous-time ( resp. discrete- 
time) nonlinear system (7.1) about the equilibrium point xo (resp. (7.2) about xq) is given by 

Continuous-Time: z = Az 
Discrete-Time: Zk+i = Azk 

where A is the Jacobian of f evaluated at xq. "k 



Linearization of a nonlinear system is a linear dynamical system. When the linearization is performed about 
an equilibrium point of a time-invariant nonlinear system, then the linearization is a linear time-invariant 
(LTI) dynamical system. 
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Example 7.2.2 The equilibrium points of the system: 

x = sin(x), 

where x(t) 6 H, were determined earlier to be: 

{•••,-2 7 r,-7r,0,7r,27T,---} = {A : 7r}^ 
The Jacobian of "f " evaluated at a generic equilibrium point xq = kir is: 



oo 

— oo 



-^sm(x)\ x=kn = cos(x)\ x=kn = (-l) 1/cl 



So, the linearization at xq = kir is: 



x 



x if \k\ is zero or even 
—x otherwise 



for any integer k. A 



7.2.2 Systems with inputs 

Consider the continuous-time system given in (7.3). The Taylor's series expansion of / about (xo, uq) is: 
f(x,u) = f(x Q ,u Q ) + — ( Xo , Uo) (x - x Q ) + — {xQjU0 ) (u - uq) + higher order terms 



du 



which is valid for all (x, u) in an open neighborhood of {xq, uq). The symbol 

du 

stands for the Jacobian of / with respect to u and has a definition similar to the Jacobian given in Defini- 
tion 7.2.1. Let us denote these two Jacobians by A and B respectively. Proceeding as before, we arrive at 
the following definition of linearization. 



Definition 7.2.3 (Linearization - with inputs) The linearization of the continuous-time ( resp. discrete- 
time) system with inputs (7.3) about the equilibrium point (xo,uo) (resp. (7.4) about (xq,uq)) is given 
by: 

Continuous-Time: z = Az + Bv 
Discrete-Time: z k +i = Az k + Bv k 

where A and B are the Jacobians of f with respect to x and u evaluated at (xq,uq). i< 
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7.2.3 Systems with inputs and outputs 

Consider the continuous-time system given by (7.3) along with the output equation: 

y = h(x,u) 

Let (xq,uq) be an equilibrium point of this system. Assume that h(xo,uo) = 0. Then, the linearization 
about (xq, ^o) i s given by: 

z = Az + Bv 
y = Cz + Dv 



where 



A= 9 -l 
dx 



df_ 

{x 0 ,u 0 ) ' B ~ Q u 



dh 

(x 0 ,u 0 ) ! g x 



„ dh 

(x 0 ,u 0 ) and D = — 



(xo,«o) 



Example 7.2.3 It is easy to see by substitution that (x — 0, u — 0) is an equilibrium point of 

x = —x 4 s'm(x) + sinti 
y = x + u cos u 

To linearize about this equilibrium point, we first calculate and evaluate the Jacobians: 



A= d -l 
dx 



_ d_ 

{x(hUo) ~ Jx 

df 



— (—x 4 sm(x) + sinttj | (o,o) = (—4a; 3 sin(x) — x 4 cos(x)^ 



(0,0) 



B 



du 



d 



(*o,«o) - du 

dh 



(— a; 4 sin(x) + sinttj J( 0 ,o) — (cos(u)) 



(0,0) 



^ dx 



(x 0 ,«o) = -^(x + ucos(u)) | ( o,o) = (1) 



(0,0) 



du 



d 



(xom) = ^(x + ucos(u)) (0 ,o) = (cos(tt) -ttsin(it)) 



(0,0) 



= 1 



Then, we form: 



z = Az + Bv = v 
y = Cz + Dv = z + v 



which is the linearization about (0, 0). 



A 



Chapter 8 



LTI System Behavior 



A continuous-time LTI system has the form: 

x = Ax + Bu, x(0) = xq (8.1a) 

y = Cx + Du (8.1b) 

where x(t) E IR n is the state, u(t) E IR m is the input and y(t) E JR P is the output. The state space 
matrices (A, B, C, D) are constant real matrices of appropriate dimensions and xq is the initial condition. 
A solution of the differential equation (8.1a) emanating from xq in response to the input u is a function 
x : [0, oo) -> IR n such that 

x(0) = xq and x = Ax + Bu 

i.e., it must satisfy the initial condition and the differential equation (8.1a). An explicit formula for the 
solution is derived in this chapter. Once the formula for x is known, we can easily obtain an expression for 
the output y by simply substituting the formula for x into (8.1b). The discrete-time LTI system: 

x k+ i = Ax k + Bu k , x(0) = x 0 (8.2a) 

y k = Cx k + Du k (8.2b) 

will also be studied; it turns out to be much easier than the continuous-time system. 

8.1 Solution of continuous-time LTI equations 

The solution is given in three steps. First, we consider the case without inputs: 

x = Ax, x(0) = Xq (8.3) 

that is, u = 0 in equation (8.1a). The system responds in this case to the initial condition and, hence, the 
solution is called initial condition or unforced or zero input response. In the second step, we set the initial 
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condition to zero and consider the effect of the input u. The solution in this case is called input or forced or 
zero initial condition response. Finally, the general case with both initial conditions and inputs is considered. 
We are able to do this because of a fundamental consequence of linearity, namely the superposition principle. 
Roughly speaking, the principle states that the total effect of a number of causes is the sum of the effects of 
the individual causes. So, the solution in the general case is simply the sum of initial condition response and 
forced response. 

Theorem 8.1.1 (Initial condition response) The solution x of (8.3) starting at the initial condition xq is 
given by: 

x(t) = e At x 0 

for all t > 0. ■ 

Definition and properties of the exponential function e A can be found in Chapter 4. It plays a key role in 
describing the initial condition response and, as we shall soon see, forced response. Some features of initial 
condition response that readily follow from linearity and properties of the exponential function are listed 
below. 



Theorem 8.1.2 (Properties of initial condition response) The following statements are true. 

1. Suppose that x and y are initial condition responses starting from xq and yo respectively at time 
t = 0. Let a and j3 be two real numbers. Then, ax + j3y is the initial condition response starting from 
axo + Pyo at time t = 0. 

2. Suppose that (A, v) is an eigenvalue-eigenvector pair of A. Then, for any initial condition in the 
eigen-subspace spanned by v, the corresponding initial condition response stays in the eigen- sub space 
spanned by v for all time. That is, 

x(0) = av for some real number a =>• x(t) = ae xt vfor allt>0 

3. Suppose that £ is an eigen-subspace of A. Then, for any initial condition xq £ £, the corresponding 
response x is such that x(t) 6 £ for all t > 0. ■ 

The first statement follows from linearity. The last two statements are very important. They say that if 
initial conditions are chosen to lie in eigen-subspaces of A, then the response never leaves the subspace. In 
mathematical terms, we say that eigen-subspaces of A are flow-invariant. The unforced system (8.3) has 
other important system-theoretic properties that are algebraic or group-theoretic in nature. To discuss these, 
we make the following important definition. 



Definition 8.1.1 (Transition matrix) The transition matrix of the LTI system (8.1) is defined as: 
for all t, s. 



8.1. SOLUTION OF CONTINUOUS-TIME LTI EQUATIONS 
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The transition matrix is a matrix-valued function of two time indices, t and s. When t > s, we can think 
of t — s as the elapsed time. Clearly, the initial condition response given in Theorem 8.1.1 can be written 
as x(t) = 4>(t, 0)xq. This equation can be interpreted as follows. Note that s = 0 and, hence, the elapsed 
time is t. So, if the system is at the state xq, then after an elapsed time of t, the system will be at the state 
4>{t, 0)xo. The next theorem describes properties of <f> and provides an intuitive explanation of what it means 
to be a transition matrix of a system. 

Theorem 8.1.3 (Properties of transition matrix) Let 4> denote the transition matrix of the system (8.3) 
with no inputs. The following statements are true. 

1. <fi(t, t) = I for any t, i.e., the state of a system is the same after t — t = 0 time has elapsed. 

2. 4>(s 7 1) = 4>(t, s)^ 1 for any t and s. Thus, if x(t) and x(s) are two points on the solution x at times t 
and s, then they are related as follows: 

x(s) = <i>(s,t)x(t) = e A{s -^x(t) = e- A{t - s ^x(t) = ^s^xit) 

3. 4>(u, t) = <p(u, s)<p(s, t)for any t, s and u, .i.e., transitioning from t tou is same as transitioning from 
t to s and then from s to u. Thus, 

x(u) = 4>(u,t)x(t) = 4>(u,s) (4>(s,t)x(t)) — (j)(u,s)x(s) 

for any three points x(t), x(s) and x(u) on the solution. 

4. (j)(t, s) satisfies: 



^<t>(t,s)=A<P(t,s) 



for any t and s. 



An important property that is implicit in the above theorem is that the transition matrix is invertible for all 
values of its arguments. It is a consequence of the definition of transition matrix; from a system-theoretic 
point, it means that a starting point and an elapsed time completely defines the end point and vice- versa. 



Example 8.1.1 Let 



which is in Jordan form. Hence, by Theorem 4.2.4, 



e At = 



0 1 

0 0 

1 t 

0 1 



for all t. Therefore, the transition matrix of (8.3) with this particular A is: 

1 t-s~ 



<t>(t,s) 



0 1 



for all t, s. 



A 
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Let us now consider the case with input u and zero initial condition. We have the following theorem. 

Theorem 8.1.4 (Forced response) The solution x of (8.1a) with initial condition xq = 0 is given by: 

x(t) = [ e A ^Bu(T)dT 
Jo 

for all t>0. m 

The integral appearing above is an example of a convolution integral which is a fundamental quantity in 
systems theory. In terms of the transition matrix <j>, the forced solution is: 

X (i) = [ (f>(t,T)u(T)dT 
JO 

It is generally difficult to evaluate these convolutions. The Laplace transform can be used in some special 
cases. But, numerical integration methods are usually applied on the differential equation (8.1a) to get the 
solution. The explicit formula for x in Theorem 8.1.4 has important theoretical applications and will be 
often used in subsequent chapters. Proof of the theorem is easy and involves applying the Leibniz rule: 



Theorem 8.1.5 (Calculus and Leibniz rule) Suppose that 4> is a matrix valued differentiable function of 
two scalar variables t and r. Then, 

d fsW da df fsW d 

I / 4>(t,T)dT = 4>(t,g(t))^ ( t )-c t) (t,f(t))^-(t)+ / ^-Mt,r)dr 

dt Jf(t) dt dt Jf(t) at 

for any differentiable functions f and g. ■ 

We complete the solution of the system (8.1) below. As mentioned, since the system is linear, the total 
solution is just the sum of initial condition response given by Theorem 8.1.1 and forced response given by 
Theorem 8.1.4. 



Theorem 8.1.6 (Complete solution) Consider the system (8.1). The state trajectory x starting at the initial 
condition xq and evolving under the input u is given by: 



x(t) = e At x 0 + /* e A ^Bu(T)dT 
Jo 



for all t > 0. The corresponding output trajectory y is given by: 

rt 



y(t) = Cx(t) + Du(t) = C (e At x 0 + j e A ^ Bu{T)dr^j + Du(t) 
for all t > 0. 
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8.2 Solution of discrete-time LTI equations 

The solution in the discrete time case is easy to obtain as the defining relation (8.2a) is a difference equation 
and not a differential equation. Clearly, when k — 1, the difference equation (8.2a) gives: 

x\ = Axq + Buo 

which is a formula for the state at time k — 1 in terms of the initial condition xq and the value uq of the 
input u at time k = 0. Let us set k = 2. Then, from the difference equation and the above formula for x\, 
we get: 

X2 = Ax\ + Bu\ 

= A (Axq + Buq) + Bm 
= A 2 x 0 + ABuq + Bm 

which is a formula for the state at time k = 2. Now, set k = 3 and proceed as before to get: 

x 3 = AX2 + BU2 

= A [a 2 x 0 + ABu 0 + Bui) + Bu 2 
= A 3 x 0 + A 2 Bu 0 + ABm + Bu 2 



A 3 x 0 + (j^A 2 'Bu^j 



Repeating this procedure of applying the difference equation and eliminating variables using previously 
obtained formulas leads to the following theorem. 

Theorem 8.2.1 (Complete solution for discrete-time system) Consider the system (8.2). The state trajec- 
tory x starting at the initial condition xq and evolving under the input u is given by: 

k-1 

x k = A k x 0 + A^^Bm 
for all k > 0. The corresponding output trajectory y is given by: 

y k = Cx k + Du k = C (^A k x 0 + Y, A^^Bm j + Du k 
for all k>0. ■ 

As in the continuous-case, the state trajectory is the sum of an initial condition response 

A k x 0 

and a forced response: 

k-1 

Y A^^Bm 

i=0 
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The forced response is now a sum, the discrete analog of an integral involved in the continuous-time case. 
This sum is an example of a discrete convolution. Another important point to note is that the transition 
matrix in the discrete-time case is given by: 



k—m 



4>(k,m) = A 

where k and m are any two time indices. It has all the properties stated in Theorem 8.1.3. Finally, for later 
use, let us write the formula for x in the following form: 



jfc-i 



x k = A k x 0 + Y, A 

i=0 



k-l-i 



Bui 



A k x 0 + [B AB A 2 B 



iJfc-i 



B] 



Uk-l 
Uk-2 



Ul 
UQ 



The matrix 

[B AB A 2 B 
will play an important role in later chapters. 



A^B] 



Chapter 9 

Lyapunov (Internal) Stability Notions 



This chapter introduces several concepts of internal stability of state space models of dynamical systems. 
They originated in the work of A. M. Lyapunov and, hence, are refered to as Lyapunov stability. Nonlinear 
systems can also exhibit stable behaviors that fall outside Lyapunov theory. We shall not consider such 
behaviors. 

9.1 Nonlinear time-invariant systems 

Consider the continuous-time system: 

x = f{x), x{0)=x 0 (9.1) 

where / : JR n — > JR n is smooth. We will focus exclusively on continuous-time systems as the definitions 
can be adapted to the discrete-time case. Throughout, x denotes the solution of (9.1) emanating from xq at 
t — 0 (if it exists). The value of this solution at a specific point in time t will be denoted by x(t). We use 
\\v\\ to denote the Euclidean norm of a vector v. 

Definition 9.1.1 (Boundedness) A solution x of (9.1) is bounded if and only if there exists > 0 such that 

\\x(t)\\ < (3 for all t > 0 

A solution that is not bounded is called unbounded. ~k 

The definition means that the solution is contained in a ball in IR n of radius j3 centered at 0 for all time. The 
radius of the ball j3 could depend on the initial condition xq. 

Example 9.1.1 The solution of x = 0 starting from any initial condition is bounded because x(t) = Xofor 
all t > 0 (estimate (3 in the definition). All solutions, except the zero solution starting from xq = 0, of 
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x = x are unbounded because x(t) = e t Xo which grows beyond any finite ball except when xq = 0. All 



solutions of x = — x 3 are bounded. 



A 



Example 9.1.2 Consider the LTI systems: 



Xl 



0 0 
0 0 



xi and X2 



0 1 
0 0 



X2 



Clearly, 



xi(t) = e 



0 0 
0 0 



xi(0) =xi(0) 



and 



X2(t) = e - 



0 1 
0 0 



x 2 (0) 



1 t 
0 1 



x 2 (0) 



xi is bounded for any initial condition; but X2 is not always bounded. For example, with the initial condition: 

x 2 (0) = 



we get: 



x 2 (t) 



which is bounded; but with the initial condition: 

x 2 (0) 



for allt>0 



we get: 



x 2 (t) = 



1 + 2* 
2 



for allt>0 



which is unbounded since the first component grows linearly with time. 



A 



Definition 9.1.2 (Local Stability (LS)) Let x e be an equilibrium point of the system given by (9.1). x e is 
locally stable in the sense of Lyapunov if and only if for each e > 0 there exists 8 > 0 such that whenever 
the initial condition x(0) satisfies 

\\x(0) — x e \\ < 5, 

the resulting solution x satisfies 

\\x(t) — x e \\ < e for allt > 0 
An equilibrium point that is not locally stable is called unstable. "k 
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Clearly, if the initial condition is taken to be x(0) = x e , then the corresponding solution is x(t) = x e and 
we have \\x(t) — x e \\ = 0 for all time. For stability, we need to know how the solutions starting from a 
neighborhood of the equilibrium point behave. The definition means the following. Suppose that the system 
is LS about x e . Pick a ball centered at x e of radius e > 0. Then, we can find a ball centered at x e of radius 
S > 0 such that if the system is started from any point inside the 6-ball, the corresponding solution will not 
leave the e-ball. There are a couple of things to note: 

• The e-ball is picked first. So, S depends on e. 

• The definition says that some 5 > 0 must exist for every e > 0. However, we have the following. 
Suppose that for a certain eo > 0 there is a 5o > 0 that satisfies the requirements in the definition. 
Then, there is an obvious choice for S that works for any e > eo- 

• LS implies boundedness of solutions emanating from the vicinity of x e . Boundedness of solutions 
does not imply LS (for nonlinear systems). A good example for this is a system exhibiting a limit 
cycle. 

Example 9.1.3 Every v 6 IR n is a LS equilibrium point of x = 0. 0 is an unstable equilibrium point of 
x = x. 0 is a LS equilibrium point of x = —x 3 . A 

Example 9.1.4 (Bounded Solutions, Unstable Equilibrium Point) Consider the Van der Pol oscillator 

X\ = X 2 

±2 — (I — x\)x 2 - X\ 

All solutions are bounded; but the equilibrium point (0, 0) is unstable. A 

Definition 9.1.3 (Local Attractor) An equilibrium point x e of the system (9.1) is a local attractor if and 
only if there exists (3 > 0 such that whenever the initial condition x(0) satisfies 

\\x{0) -x e \\ < 0, 

the resulting solution satisfies 

lim x(t) = x e 

t— >oo 

(i.e. converges asymptotically to the equilibrium point). 

Definition 9.1.4 (Local Asymptotic Stability (LAS)) Let x e be an equilibrium point of the system given by 
(9.1). x e is locally asymptotically stable in the sense of Lyapunov if and only if the following two conditions 
hold: 



1. x e is LS 
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2. x e is a local attractor. ■ 

The second condition in the definition means that all solutions starting from a neighborhood converge to the 
equilibrium point as time goes to oo. In other words, a LAS equilibrium point is also an attractor. We might 
think that the converse is true, i.e. if all solutions starting from a neighborhood of the equilibrium point 
come arbitrarily close to it after some time (or even converge to as t goes to oo) the system must be LAS. 
This however is false. There are unstable nonlinear systems whose solutions starting from a neighborhood 
converge to the equilibrium point. This is why in the definition of LAS we have the first condition. 

LAS does not say anything about how fast we converge to x e . The next definition involves rate of conver- 
gence. 

Definition 9.1.5 (Local Exponential Stability (LES)) Let x e be an equilibrium point of the system given 
by (9.1). x e is locally exponentially stable in the sense ofLyapunov if and only if the following two conditions 
hold: 

1. x e is LAS 

2. There exist a > 0 and 7 > 0 such that for any convergent solution x starting at x(0), we have 

\\x(t) -x e \\ < a||:r(0)||e~ 7 * 
for all t>0. M 

In the definitions given above, the initial conditions were chosen from a neighborhood of the equilibrium 
point. Hence, these notions are local in nature. The next definitions are global. 

Definition 9.1.6 (Global Asymptotic Stability (GAS)) Let x e be an equilibrium point of the system (9.1). 
x e is globally asymptotically stable in the sense ofLyapunov if and only if the following two conditions hold: 

1. x e is LS 

2. Any solution x satisfies 

lim x(t) = x e 

t— >oo 



Compare this definition with LAS definition. 

Example 9.1.5 The system x = —x is GAS. The damped simple pendulum 



X\ = X2 

X2 = — s'mxi — X2 



9.2. LTI SYSTEMS 
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is LAS but not GAS because it has more than one equilibrium point. In fact, a necessary condition for GAS 
is the existence of one and only one equilibrium point. ■ 

Definition 9.1.7 (Global Exponential Stability (GES)) Let x e be an equilibrium point of the system given 
by (9.1). x e is globally exponentially stable (in the sense of Lyapunov) if and only if the following two 
conditions hold: 

1. x e is GAS 

2. There exist a > 0 and 7 > 0 such that for any solution x starting at x(0), we have 

\\x(t) -x e \\ < a||x(0)||e" 7 * 
for all t>0. m 



9.1.1 Relations between stability notions 

The different notions of stability defined above are related in the following way. 

LES => LAS => LS => Locally Bounded 

it it it it 

GES => GAS => GS => Bounded 

The general idea is that global properties imply local properties, and exponential convergence implies 
asymptotic convergence. The converse is not true in general. For (finite dimensional) linear systems, the 
converse is true. That is, local implies global, and asymptotic implies exponential. This is discussed in the 
next section. 



9.2 LTI systems 

When the system under consideration is LTI, the nonlinear differential equations of (9.1) become: 

x = Ax, x(0) = x Q (9.2) 
where A is a constant matrix. Applying the results of Chapter 8, the solution x can be written as: 

x(t) = e At x 0 

This explicit formula for the solution, which is unavailable in the general nonlinear case, can be used to 
great advantage in studying LTI stability. In fact, we shall use this formula and properties of the exponential 
function to derive several simplifications. 

Boundedness of solutions and LS are different concepts for nonlinear systems. This is illustrated by the Van 
der Pol oscillator example 9.1.4. They are equivalent concepts for LTI systems. 
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Theorem 9.2.1 (LS vs. bounded solutions) The following statements are true for the LTI system (9.2). 

1. 0 is LS if and only if solution emanating from any initial condition is bounded. 

2. An equilibrium point is LS if and only if solution emanating from any initial condition is bounded. 

3. An equilibrium point is LS if and only if every equilibrium point is LS. ■ 

Proof of this theorem relies on two properties of linear systems (i) scaling initial conditions simply scales 
the solution by the same amount, and (ii) if the solutions starting from a finite set of initial conditions are 
bounded, then so are the solutions starting from any linear combination of the initial conditions. These are 
properties not satisfied by general nonlinear systems. Statement 3 is a trivial consequence of Statements 1 
and 2. We have stated it because it indicates that for LTI systems, LS is a property of the system and is 
independent of the equilibrium point. 

The next result shows that local and global notions of stability are equivalent for LTI systems. 

Theorem 9.2.2 (Local vs. global) The following statements are true for the LTI system (9.2). 

1. 0 is LAS if and only ifO is GAS. 

2. 0 is LAS if and only ifO is LES. ■ 

These statements suggest that there is no need to distinguish between LAS, GAS, LES and GES for LTI 
systems. They all mean the same thing. Proof again relies on linearity. For instance, to show GAS from 
LAS, we use scaling argument. The equivalence of LAS and LES is somewhat different and involves 
estimating the norm of the transition matrix. 

The results cited above allow us to compile the following table for LTI systems. 

LES 44> LAS => LS <3> Locally Bounded 

t t t t 

GES <3- GAS ^ GS ^> Bounded 

9.3 Lyapunov's direct method 

There are two ways to establish asymptotic stability of a nonlinear systems. The first method called Lya- 
punov's direct method is based on constructing an energy-like function known as the Lyapunov function and 
showing that energy decreases along the trajectories of the system. This idea originated from mechanical 
systems where the notions of kinetic and potential energies are well-established. In mechanical systems, the 
total energy which is the sum of kinetic and potential energies always decreases with time due to friction 
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and other dissipative effects. Since energy cannot dissipate for ever, the system eventually settles down to a 
minimum energy state. This is the behavior after which asymptotic stability was modeled. A good example 
is a simple mass-spring-damper system. The second method called Lyapunov's indirect method is based 
on linearizing the nonlinear system and studying the properties of the linearization. We will consider this 
method in more detail in the next chapter. 

Consider the continuous-time system in (9.1). Let x e be an equilibrium point and N be an open neighbor- 
hood of x e . 

Definition 9.3.1 (Lyapunov function) A continuously differentiable function V : N — >• JRis a Lyapunov 
function for the system (9.1) if and only if it has the following two properties: 

1. Positive Definiteness: V(x) > Ofor all x ^ x e in N 

2. Dissipativeness: V(x) < 0 in N. That is, the rate of change ofV along the solutions of the system 
(9.1) emanating from points in N is less than zero for all time. ■ 

Example 9.3.1 Consider the system x = —x. The function V defined as V(x) = x 2 is a Lyapunov function 
for this system. The system x = x has no Lyapunov function. ■ 

Lyapunov functions are extremely difficult to construct for general nonlinear systems. We must search over 
all positive definite functions to find one that is dissipative. The set of functions that are positive definite is 
of infinite dimension making numerical search usually difficult. Even when physical notions of energy are 
known as in mechanical systems, a great deal of intuition is required to come up with the correct function. 
This is because mechanical energy is not always the Lyapunov function. But, a great simplification occurs 
in the case of LTI systems. In this case, the two methods of Lyapunov are identical and we can eliminate 
functions of higher degree from the search for Lyapunov function. We discuss this below. 

Definition 9.3.2 (Quadratic Lyapunov function) A Lyapunov function of the form 

V(x) = x*Px 

where P is a positive definite matrix is called a quadratic Lyapunov function. The matrix P is called the 
corresponding Lyapunov matrix. ■ 

Theorem 9.3.1 (Quadratic Lyapunov functions and LTI systems) Consider the LTI system: 

x = Ax 

The following statements are equivalent. 
1. The LTI system is asymptotically stable. 
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2. The LTI system has a quadratic Lyapunov function. ■ 

So, in the LTI case, we do not have to search over all positive definite functions. It is enough to consider 
quadratic Lyapunov functions. The next chapter will show how to construct such functions by solving linear 
equations. 



Chapter 10 

Lyapunov Stability of LTI Systems 



Stability as defined in the last chapter is a system theoretic property. It deals with the behavior of system 
trajectories. An important feature of Lyapunov's direct and indirect approaches is that stability is determined 
without actually solving the system equations for the trajectories. Stability properties are deduced from 
other characteristics of the system. A complete study of stability is possible using these methods for the LTI 
system: 

x = Ax, x(0) = x 0 (10.1) 

More importantly, we shall show that stability is equivalent to certain linear algebraic properties of the 
matrix A. 

10.1 Lyapunov equation & inequality 

Let A and Q be given n x n matrices. An equation of the form 

AP + PA* + Q = 0 

is called the continuous-time Lyapunov equation. Here, P is the solution. The discrete-time Lyapunov 
equation has the form: 

P - APA* = Q 

Lyapunov equations are linear in the (unknown) P. As with any linear system of equations, Lyapunov 
equations may or may not have solutions. For example, if A = 0 and Q — 1, there exists no solution for the 
continuous-time Lyapunov equation. On the other hand, if A = 0 and Q = 0, any scalar is a solution of the 
continuous-time Lyapunov equation. The following result shows when a solution exists. 

Theorem 10.1.1 (Existence of solutions of Lyapunov equations) Let Abe a given nxn complex matrix. 
The following statements are true. 
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1. For each n X n complex matrix Q, the continuous -time Lyapunov equation 

AP + PA* + Q = 0 

has an unique solution P if and only if 

Aj + Xj + 0 

for each pair of eigenvalues Xi,Xj of A. 

2. For each n X n complex matrix Q, the discrete-time Lyapunov equation 

P - APA* = Q 

has an unique solution P if and only if 

XiXj + 1 

for each pair of eigenvalues Aj , Xj of A. ■ 

Let A be a real matrix. Then A is an eigenvalue of A if and only if A is an eigenvalue of A. Therefore, in 
this case, statement 1 of the theorem says that the Lyapunov equation has a unique solution if and only if the 
eigenvalues of A are distributed in the complex plane in such a way that no two of them add up to zero, i.e, 
they are not distributed symmetrically about the imaginary axis. Similar statements about the distribution of 
eigenvalues can be made in the case when A is complex and in the discrete-time case. 

Suppose that the eigenvalues of A are contained in the open left half plane, i.e 

Xi(A) € {s : real part of s < 0} 

Pick any two eigenvalues of A, say Ai and A2. They are in general complex numbers and can be written as: 

Ai = o\ + jivi and A 2 = cr 2 + jw 2 

where the real parts o\ and 02 are both strictly negative. So, 

Ai + A 2 = 01 + jwi + cr 2 - jw 2 = (cti + ct 2 ) + j (wi - w 2 ) 

cannot equal 0 because the real part is the sum of two numbers that are both strictly negative. Thus, any 
A whose eigenvalues are contained in the left half plane satisfies the requirement of statement 1 of The- 
orem 10.1.1 for the existence of a unique solution. We can actually say more as the following theorem 
shows. 

Theorem 10.1.2 (Explicit formula for the solution of Lyapunov equation) Let A e (D nXn and Q G 
C nXB be given. The following statements are true. 
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1. Suppose that the eigenvalues of A are contained in the open left half plane. Then, 

roo 

P= / e At Qe AH dt 
Jo 

exists as a matrix and is the unique solution of the continuous-time Lyapunov equation AP + PA* + 
Q = 0. 

2. Suppose that the eigenvalues of A are contained in the open unit disk. Then, 



p= Y, Ak Q A * 



\ *k 

A;=0 

exists as a matrix and is the unique solution of the discrete-time Lyapunov equation P — APA* = Q. 



These expressions for the solution are not meant for computations. We would compute a solution by solving 
the Lyapunov equation using linear algebra like any other equation. The formula is useful for theoretical 
applications and will be used frequently. 

We now turn to Lyapunov inequality. Recall that a real matrix is positive if and only if it is symmetric and 
all its eigenvalues are positive. Similarly, a complex matrix is positive if and only if it is Hermitian and all 
its eigenvalues are positive. Now, given two positive matrices M and N, we say that M is greater than or 
equal to N (denoted by M > N) if and only if M — N is positive. That is, 

M>N^M-N>0 

We say that M is less than or equal to N (denoted by M < N) if N — M is positive. 

Let A be a given n x n matrix. An inequality of the form 

AP + PA* < 0 

is called the continuous-time Lyapunov inequality. The discrete-time Lyapunov inequality has the form: 

P - APA* < 0 

Solving these inequalities numerically are only a little more difficult than Lyapunov equations. But, they 
have advantages that far exceed those of equations and, as a result, are currently the preferred way to study 
stability and related properties. 



10.2 Main stability theorems for continuous-time LTI systems 

As the first of many algebraic characterizations of system-theoretic properties, we give testable necessary 
and sufficient conditions for stability in terms of the eigenvalues of the system matrix A. Some properties 
of stable systems will also be given. Recall that the LTI system (10.1) has the solution: 

x(t) = e At x 0 

where xq is the initial condition. 
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Theorem 10.2.1 (Continuous-time case) The following statements are true of LTI systems. 

1. All solutions are bounded if and only if for each eigenvalue A, of A, we have: 

a. The real part of A, is < 0 and 

b. If real part of 'A, is equal to zero, then its algebraic and geometric multiplicities are the same. 

2. A system is stable about an equilibrium point if and only if all solutions are bounded. 

3. If an equilibrium point is AS, then it is zero and it is the only equilibrium point. 

4. A system is AS if and only if the real part of the eigenvalues of A are strictly less than zero (i.e. all the 
eigenvalues of A are in the open left half plane). 

5. A system is AS if and only ife At tends to zero as t tends to oo. ■ 

All the statements above can be proved by transforming to a new set of coordinates where the system matrix 
is the Jordan matrix associated with A. That is, introduce the similarity transformation x = T~ 1 x where T 
is such that A = T~ l AT and A is Jordan. Note that the A is: (a) upper-triangular, (b) the main diagonal 
contains the eigenvalues of A and (c) the diagonal above the main diagonal consists of 0s and Is. As a 
result, e At is: (a) upper-triangular, (b) the main diagonal contains e A ^*, (c) the diagonal above contains 0s 
and te x ( A ^, (d) the diagonal above the last one contains 0s and i 2 e A ^* and so on. 

Statements 1 and 2 of the theorem show when a system is stable. In order to check stability, we need to 
compute the eigenvalues of A as well as their geometric and algebraic multiplicities. This is an exceedingly 
difficult task from a numerical point of view. The third statement provides an easily testable condition for 
AS. In this case, we simply need to compute the eigenvalues of A and check if their real parts are strictly 
negative. 

We now give other characterizations in terms of Lyapunov equations and inequalities. The difference be- 
tween stability and AS apparent in the above theorem will make us consider the cases separately. 

Theorem 10.2.2 (Continuous-time LTI asymptotic stability) The following statements are equivalent. 

1. The eigenvalues of A are contained in the open left half plane. 

2. There exists P > 0 such that AP + PA* < 0 

3. There exists Q > 0 such that QA + A*Q < 0 

4. For each R > 0, there exists P > 0 such that AP + PA* + R = 0 

5. For each S > 0, there exists Q > 0 such that QA + A*Q + S = 0 ■ 
Theorem 10.2.3 (Continuous-time LTI stability) The following statements are equivalent. 
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1. LTI system is stable. 

2. There exists P > 0 such that AP + PA* < 0. ■ 

10.3 Two stability related properties of systems 

Suppose that a system is stable in some coordinate system. That is, 

x = Ax 

and the eigenvalues of A are in the closed left half plane. Now for some reason, we dont like the coordinates 
x and wish to express the system in some other coordinate system z. It would be nice if stability notions 
were independent of coordinate systems. In other words, we would like stability to be a property of the 
system arising from the underlying physics rather than the coordinates chosen to express the system in. This 
turns out to be true. 

Definition 10.3.1 (LTI coordinate transforms) Consider an LTI system (10. 1). Let T be an invertible 
matrix and define the matrix A as: A = T~ 1 AT. The matrix T is called the coordinate (or similarity or 
Lyapunov) transformation, A is said to be similar to A, the state vector 

x = T~ x x 

is the transformed coordinates and the LTI system 

x = Ax 

is the transformed system. ■ 

The next theorem shows that stability is invariant under coordinate transformations. 

Theorem 10.3.1 (Coordinate invariance) Consider the systems 

x = Ax and x = Ax 

where the coordinates x and x are related through a similarity transformation T (that is, x = T~ 1 x). Then, 
x = Ax is stable if and only ifx = Ax is stable. Further, x = Ax is asymptotically stable if and only if 
x = Ax is asymptotically stable. ■ 

Proof is easy and involves showing that eigenvalues and their multiplicities are invariant under similarity 
transforms. 
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Any reasonable model of a real system must have uncertainty which can be parametric or non-parametric. 
Consider the system: 

x = (A + AA) x 

where A e IR™*™ is the nominal system model and A A is an unknown matrix representing parametric 
uncertainty (i.e. uncertainty in the elements of A). Usually, it is known that AA belongs to a collection of 
matrices A. For example, consider the second order system: 



X = 



OJr, 



where the natural frequency ui n is unknown, but belongs to the interval [w m j„, oj max ]. Define 

-(oj oj 



AA 



-oj 



oj 6 [oj 



mm j ^max 



and note that we can write the system as: 



x = (0 + AA) x 



where AA € A. We will say that the uncertain system is robustly stable if it is AS for every possible 
AA € A (actually we need a little more). This means that no matter what the values of the actual parameters 
are, as long as they are in the admissible set A, the real system will be AS. The next result shows that every 
nominal system that is AS can admit some uncertainties, that is, it is robustly stable. 

Theorem 10.3.2 (Robustness) Suppose that the nominal system x = Ax is AS. Then, there exists e > 0 
such that for any AA satisfying 

\\AA\\ < e 



we have that the perturbed system 



x = (A + AA) x 



is AS. 



10.4 Discrete-time LTI systems 

As in the continuous-time case, stability of LTI discrete-time systems is intimately related to the location 
of eigenvalues of A. In the continuous-time case, eigenvalues must be in the (closed) left half plane for 
stability; whereas in the discrete-time case, eigenvalues must be in the (closed) unit disc for stability. The 
following theorem is the discrete-time analog of the main theorem of the previous section. 

Theorem 10.4.1 Consider the LTI system: 

Xk+i = Ax k 

The following statements are true. 
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1. All solutions of the system are bounded if and only if for each eigenvalue A, of A, we have 

a. The magnitude of\ is < 1 and 

b. If the magnitude ofXi is equal to 1, then its algebraic and geometric multiplicities are the same. 

2. The system is stable about an equilibrium point if and only if all solutions are bounded. 

3. The system is AS if and only if the eigenvalues of A have magnitude strictly less than 1. 

4. The system is AS if and only if A k tends to zero as k tends to oo. ■ 

All of the results stated in the previous section have discrete-time analogs. 

10.5 Lyapunov's indirect method 

The following theorem explains why linear systems are important. It is known as Lyapunov 's first method 
or Lyapunov 's indirect method. 

Theorem 10.5.1 Consider the nonlinear system in (9.1) and let x e be an equilibrium point. Assume that f 
is continuously differ entiable in a neighborhood of x e . Denote by 

x = Ax 

the linearization of (9.1) about x e . The following statements are true: 

1. If the linearization is AS about 0, then the nonlinear system is IAS about x e . 

2. If the linearization is unstable about 0, then the nonlinear system is unstable about x e . ■ 

As an example of the usefulness of this theorem, consider the Van der Pol oscillator in Example 9.1.4. 
Linearization about the equilibrium point (0, 0) gives: 



which can be shown to be unstable about (0, 0). Hence, the Van der Pol oscillator is unstable about (0, 0). 

Note that the theorem allows us to infer stability of the nonlinear system from the stability of the lineariza- 
tion. The converse (stability of the linearization from the stability of nonlinear system) is not necessarily 
true as the following example illustrates. 



0 



1 



x = 



x 



-1 1 



Example 10.5.1 The nonlinear system x = —x 3 can be shown (show this) to be LAS about 0. But, the 
linearization about 0 is x = 0 is not AS. ■ 
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Chapter 11 

Controllability and Stabilizability 



Consider the LTI system: 



x = Ax + Bu 



(11.1) 



where x(t) 6 IR n is the state vector and u(t) 6 IR m is the control input vector. In the previous chapters, 
u was simply referred to as the input vector; here in this chapter, it will be thought of as a control input. 
A control input is a vector of signals that the designer can choose in order to achieve some goal. This is 
unlike external disturbances and measurement noises which are not at our discretion (such signals are called 
exogenous inputs). 

Recall that Lyapunov stability dealt with system behavior in the vicinity of an operating point. As circum- 
stances change, we may have to transition from the current operating point to another, or more specifically, 
from one state to another. The problem of taking a system from one state to another in a safe and sound 
manner is a problem in control design. This is a hard problem and we will not attempt to solve it. Note that 
if there is no system trajectory passing through the initial and final states, then we cannot hope to transition 
at all. The controllability problem is to solve this easier problem of checking if there is a control input that 
takes the system from an initial state to a final state. It makes no claims about the safety and soundness of 
transition or about what happens upon reaching the final state. 



We begin with the definition of controllability. Recall that, given any initial condition x(0) — xq, the system 
equation (11.1) can be solved to get: 



Jo 

where the first term is the initial condition (or zero input) response and the second term is the forced (or zero 
state) response. 
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(11.2) 
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Definition 11.1.1 (Controllable) The LTI system (11.1) is controllable if and only if for any initial state 
xq E IR n and final state Xf E JR n , there is a finite time T > 0 and a control input u defined in the interval 
[0, T] such that 

r T 

x(T) = e AT x 0 + e A V- T ^Bu(T)dT (11.3) 
Jo 

is equal to x /. In other words, for any pair (xq, Xf) of points, there is a control input that drives the system 
from the initial state xq to the final state Xf infinite time T. ■ 

A few things in the definition are very important. The time T taken to go from initial state to final state must 
be finite. We may take a second or a million years; but a finite amount of time. Also, a control input must 
exist for any pair of initial and final states (xq, Xf) for the system to be controllable. The control input may 
not be unique, but at least one must exist. If a system is not controllable, then we say that it is uncontrollable. 

Strictly speaking, in the above definition, we should say that the system is controllable during the time [0, T]. 
For LTI systems, it turns out that the system is controllable during some time interval [0, T] if and only if 
it is controllable for any (nonempty) time interval. This fact will become obvious from corollary 11.1.1. 
Therefore, the concept of controllability of LTI systems is independent of how long it takes, and we say 
that the system is controllable or uncontrollable. Also, from now on, we will say that the pair (A, B) is 
controllable (uncontrollable) to mean that the system x = Ax + Bu is controllable (uncontrollable). This 
terminology has a more algebraic flavor and is in tune with our theme of algebraic characterization of system 
properties. 

Controllability problem requires us to find a control input that will take the system from an initial state to a 
final state in T seconds. Here, the time T, the initial state xq and the final state are given. The unknown 
is the control input u which we must determine by solving the convolution integral equation in (11.3). The 
convolution integral is a linear operator in the control input and, as with all linear equations, we ask the 
following questions: 

1. Given a final time T and a pair of initial and final states (xo,Xf), does there exist a control input that 
drives the system from xq at t = 0 to x / at t = T ? 

2. Does there exist a control input that drives the system from xo to Xf in finite time for any pair of 
initial and final states (xo,Xf) ? That is, when is the system controllable ? 

3. If a control exists, find all control inputs that drives the system from xq at t = 0 to xj at t = T ? 

4. When is the control input unique ? 

Compare these questions with the questions for the linear equations discussed prior to Theorem 3.4.2. As 
preparations, we introduce a number of definitions. 
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Definition 11.1.2 (Controllability matrix) Let A E M nXn and B E IR nXm . The n x nm matrix 

Q C = [B AB A 2 B ■■■ A^B] 
is called the controllability matrix of the pair (A, B). ■ 

Definition 11.1.3 (Controllable subspace) Range of the controllability matrix Q c is called the controllable 
subspace. That is, it is the set of all points y E JR n for which the linear equation: 

Q c v = y 

has a solution v E IR nm . ■ 

Definition 11.1.4 (Finite-time controllability gramian) Let oo > T > 0. The matrix 

P C (T) = [ T e At BB*e AH dt 
Jo 

is called the finite-time controllability gramian of the pair (A, B). ■ 

Since T is finite, this matrix always exists. In chapter 10, we saw 

/ e At Me AH dt 
Jo 

which looks similar to the controllability gramian. In fact, the above matrix is called the infinite-time con- 
trollability gramian of the pair (^4, M 1 / 2 ) where A is stable. Note also that with a change of variable, we 
can write P C (T) as 

P C (T)= [ T e A ^BB*e A ^ T - s Us 
Jo 

The next lemma is very important. It is similar to something we have seen before. In theorem 8.1.2, we 
considered the linear system x = Ax. For this system, if the initial condition lies in an eigen-subspace of 
A, then the initial condition response will remain in that subspace for all time. Thus, eigen-subspaces are 
invariants for the system dynamics with no inputs. Here is the analog of an invariant subspace for the system 
dynamics with inputs. 

Lemma 11.1.1 (Invariance of controllable subspace) Suppose that xq = 0. Then, for any control input 
u, the solution 

x(t)= [ e A ^Bu{T)dT 
Jo 

satisfies x(t) E R(Q C ) far all t > 0. That is, for each t > 0, there exist vq (t) , ■ ■ ■ , v n -\ (t) such that: 

n-l 

x(t) = Y,A k BviW = QMt) 

k=0 

where v(t) is the vector obtained by stacking Vo(t), • • • , v n -i(t) one below the other into a column. ■ 
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Note that xq = 0 is in the range of Q c because the range of a matrix is a subspace. So, if a solution 
starts at 0 which is a point in the controllable subspace, then it stays in the range of Q c , equivalently, in the 
controllable subspace for all time. We chose xq = 0 for simplicity. In fact, if a solution starts at any point 
in the controllable subspace, then it stays in the controllable subspace for all time and for any control input. 
This implies that we cannot access any point outside the controllable subspace from within. We are now 
ready to state the answer to some of the questions raised earlier. 

Theorem 11.1.1 (Solution of the controllability problem) Let xq, Xf and T > 0 be given. The following 
statements are equivalent: 

1. There exists a control input u defined in the interval [0, T] that drives the system from xq to xj in time 
T. 

2. xj — e AT XQ is in the controllable subspace. 

Moreover, if statement 2 holds, then a control input u that takes xq to xj is given by: 

u(t) = B*e A *( T -Qy, Vte[0,T] 

where y is any solution of 

P c (T)y = x f -e AT x 0 

and P C (T) is the finite-time controllability gramian. ■ 

Using this theorem, a control input that drives the system from xq to xf in finite time can be computed in 
the following steps: 

(1) Fix a final time T > 0 (any strictly positive number will do). Compute the controllability gramian 

rT 



,{T) = j 
Jo 



P C (T) = I e At BB*e AH dt 



and 



AT 

Z = Xf — e Xq 



(2) Check if the linear system: 

Pc(T)y = z 

has a solution. If so, find one. Use theorem 3.4.2 of chapter 3 for solving this linear algebra problem. 

(3) Define the control input as in the previous theorem. 



The linear system in Step 2 may or may not have a solution. If it has no solution, then there exists no control 
input that drives the system from xq to Xf. On the other hand, if it has many solutions, then each solution 
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will in general give rise to a different control input that drives the system from x o to x/ in T seconds along 
possibly different trajectories. If we compute 

y = P c (T) + z 

where P C (T) + is the Moore-Penrose inverse of P C (T), then the resulting control input is the control input of 
least energy (in the £2 ()) sense). Finally, note that we can always define y as above and calculate a control 
input u whether or not the linear system in Step 2 has a solution. If there is no control input that drives the 
system from xq to Xf , then the control input calculated above brings the system from xq to a point as close 
as possible to xj in T seconds. 

We now ask when does there exist a control input for any pair of initial and final states, equivalently, when 
is the pair (A, B) controllable. 

Corollary 11.1.1 (System controllability) The pair (A, B) is controllable if and only if the range ofQ c is 
JR n , equivalently, the rank of Q c is n. ■ 

Accordingly, the system can be taken from any initial state to any final state in finite time if and only if 
all the states are in the controllable subspace of the system. This makes sense intuitively. After all, by the 
invariance lemma 11.1.1, the controllable subspace cannot be left once inside. So, there can be no point 
outside it if the system is controllable. Notice that the testable condition for controllability given in the 
corollary is algebraic. We just need to determine the rank of the controllability matrix to check if the system 
is controllable or not. The next theorem gives several other algebraic characterizations of controllability. 

Theorem 11.1.2 (Algebraic tests for controllability) Let A e IR nXn and B e IR nXm be given matrices. 
The following statements are equivalent: 

1. The pair (A, B) is controllable 

2. Range of Q c = JR n (Rank of Q c = n) 

3. Null space of Q* = {0} 

4. P(T) > 0forallT> 0 

5. Rank of 

[XI -A B] 

is nfor all complex numbers A ( this is known as PBH test). 

6. Rank of 

[XI -A B] 

is nfor all eigenvalues Xof A( this is known as modified PBH test). 
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7. Suppose that there exists x such that A*x = Xx and B*x = 0 for some complex number X. Then, 
x = 0. 

8. Suppose that (A, x) is an eigenvalue-eigenvector pair of A*. Then, B*x ^ 0. 

9. Given any set p-i, p-2, • • -,1-in ofn complex numbers, there exists a state feedback gain K such that the 
eigenvalues of A + BK are /ii, /i2, • • ■, p> n (this is known as pole placement test). ■ 

The last statement refers to the pole placement control problem which is to find a control law of the form: 

u = Kx 

called state feedback control law such that the eigenvalues of the closed loop system 

x = (A + BK)x 

are at some pre-specified complex numbers. Usually, the poles of the closed loop system are chosen to lie 
in the open left half plane so as to realize good transient and steady state performances such as rise time, 
overshoot and zero tracking error. According to the statement, poles of the closed loop system can be placed 
anywhere if and only if the pair {A, B) is controllable. 

We conclude this section with the following remarks. Controllability does not say anything about what 
happens to the system after T seconds. In fact, if the simulation is continued beyond T seconds, the system 
will continue to evolve and most likely leave the final state. This is one of the reasons why we mentioned in 
the introduction that controllability does not solve the control problem of transitioning from one operating 
condition to another. Also, we did not place any restrictions on the control input that drives the system from 
initial state to final state. Large and fast control inputs cannot be applied in practical systems due to rate and 
position saturation. So, it is unrealistic to go from initial to final states in arbitrarily small time. 



11.2 Stabilizability 

We now introduce a concept that is closer to control design than controllability is. Recall that the modified 
PBH test for controllability of the pair (A, B) says that the pair is controllable if and only if the matrix 

[XI -A B] (11.4) 

has rank n for all eigenvalues A of A. Suppose that the pair (A, B) is uncontrollable. Then, there exists an 
eigenvalue A of A for which the rank of the matrix in (1 1.4) is strictly less than n. This allows us to classify 
eigenvalues of A into two groups. 



Definition 11.2.1 (Controllable and uncontrollable modes) An eigenvalue Xof A is called a controllable 
mode of the pair (A, B) if and only if the matrix defined in (11.4) has full rank. Otherwise, X is called an 
uncontrollable mode of (A, B). ■ 
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Controllability implies that all the eigenvalues of A are controllable modes. This is too much to ask of real 
systems where usually only a fraction of the modes are controllable. Notice that, by the pole placement test 
of Theorem 11.1.2, if a system is controllable, then a state feedback law can be found to assign poles of the 
closed loop system to any desired location and thereby achieve any level of performance. Physical systems 
come with features that limit achievable performance. One of these features is the presence of uncontrollable 
modes. 

Definition 11.2.2 (Stabilizable) A pair (A,B) is (continuous-time) stabilizable if the real part of each 
uncontrollable mode of the pair (A, B) is strictly less than 0. ■ 

This definition has a nice interpretation using pole placement. As mentioned before, if the pair (^4, B) is 
controllable, then the poles of A + BK can be placed anywhere. Thus, a controllable mode can be moved 
around in the complex plane as we wish using state feedback. If a mode is uncontrollable, then it cannot be 
moved by constant gain state feedback. Stabilizable means that all those modes that cannot be moved must 
be stable. This is the minimum requirement for the existence of a gain K so that the closed loop system 
x = (A + BK)x is asymptotically stable. 

The main result on stabilizability is the following (compare with the corresponding statements for control- 
lability): 

Theorem 11.2.1 (Algebraic tests for stabilizability) Let A € M nXn and B £ E nXm be given. The fol- 
lowing statements are equivalent. 

1. The pair (A, B) is stabilizable 

2. PBH Test: Rank of 

[XI -A B] 

is nfor all complex numbers X in the closed right half plane. 

3. Modified PBH Test: Rank of 

[XI -A B] 

is nfor all eigenvalues Xof A in the closed right half plane. 

4. There exists K such that all the eigenvalues of A + BK have strictly negative real parts. ■ 

The test of stabilizability examines only the eigenvalues of A that are in the closed right half plane; whereas 
the corresponding test for controllability looks at all the eigenvalues of A. So, a controllable pair is stabiliz- 
able. The converse is not true. 
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Chapter 12 

Observability and Detectability 



Consider the LTI system: 

x = Ac, x(0) = xq (12.1a) 
y = Cx (12.1b) 

where x is the state vector and y is the output (measurement) vector. This system has no input and is driven 
by the initial state xq. The state and output responses are: 

x(t) = e At x 0 and y(t) = Ce At x 0 

for all t > 0. So, if the initial state is known then the state x(t) and the output y(t) are known for all time 
t > 0. The subject of this chapter is the inverse problem of determining the initial condition xq from the 
output time history. If the initial condition can be uniquely determined, then 

x(t) = e At x 0 

gives the evolution of states for all time. Thus, from the output time history, we obtain a complete picture of 
the evolution of internal system states. 

Recall that the controllability problem involves finding control inputs to transition from one state to another 
in finite time. So, in some sense, it has to do with how effective the control channels are. Observability has 
to do with how effective the sensors are. Remember that a state is a representation of the internal workings 
of a system. Unfortunately, we cannot measure every state in practical applications either because there 
are too many states or because it is not physically possible. So, from a limited number of sensor read-outs 
which are algebraic combinations of states (y = Cx) , we must determine exactly what is going on inside 
the system. This is the observability problem in a nutshell. It is a simpler problem than the practical problem 
of sensor selection and placement to observe a system. 

The practical problem is also complicated by the presence of measurement noise which is neglected here. 
Noise will always prevent us from knowing the system states exactly. But, in applications, exact knowledge 
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of the states is not required. We only need to know an estimate of the state. This problem of estimating states 
from noisy measurements is known as the state estimation or filtering problem which has been extensively 
studied. Although we will not discuss filtering in this chapter, it should be noted that the concepts of 
observability and detectability are essential for filtering. 



12.1 Observability 



Definition 12.1.1 (Observable) The LTI system (12.1) is observable if and only if there exists a finite time 
T > 0 such that y(t) = Ofor all t E [0,T] implies that xq = 0. That is, the system is observable if and 
only if the only initial condition that results in zero output over a nonempty time interval is the zero initial 
condition. 



The definition considers only a specific output response, namely, zero output. This may look restrictive; but 
by linearity, the definition is equivalent to the following: Suppose that y\ and j/2 are outputs generated with 
initial states x\ and X2- Then, yi(t) = y2(t) over a nonempty interval implies that x\ = X2- The main 
reason for stating the definition in terms of y = 0 is that zero output provides no information about the 
internal states and is thus the worst output. If the initial condition (and hence the internal state evolution) 
can be uniquely determined even for the worst output, then it can be determined when the output is not 
identically zero. If a system is not observable, then we say that it is unobservable. 

Strictly speaking, in the above definition, we should say that the system is observable during the time [0, T]. 
For LTI systems, it turns out that the system is observable during some time nonempty interval [0, T] if and 
only if it is observable for any nonempty time interval. So, we shall simply say that the system is observable 
(or unobservable). Also, from now on, we will say that the pair (C, A) is observable (or unobservable) to 
mean that the system in (12.1) is observable (or unobservable). 



Definition 12.1.2 (Observability matrix) Let A E M nxn and C E M mxn . The nm x n matrix 



Qo 



C 
CA 
CA 2 



CA 



n-l 



is called the observability matrix of the pair (C, A). 



Definition 12.1.3 (Unobservable subspace) Null space of the observability matrix Q 0 is called the unob- 
servable subspace. That is, unobservable subspace is the set of all points x E IR n such that Q 0 x = 0. 
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Definition 12.1.4 (Finite-time observability gramian) Let 0 < T < oo. The matrix 

p o (T) = / e AH C*Ce At dt 
Jo 

is called the finite-time observability gramian of the pair (C, A). 
The matrix 

roo 

P 0 = / e A ' t C*Ce M dt 
Jo 

which exists if A is stable is called the infinite-time observability gramian of the pair (C,A). As before, 
with a change of variable, we can write P 0 (T) as 

P 0 {T)= ( T e A ^ T -^C*Ce A{ - T - s Us 
Jo 

Recall from the last chapter on controllability that if a system starts out in the controllable subspace, then 
it cannot leave that subspace no matter what control input is applied. The analogous question here is what 
happens if we start in the unobservable subspace. 

Lemma 12.1.1 (Invariance of unobservable subspace) Suppose that xq is in the unobservable subspace. 
Then, y(t) = Ofor all time t > 0. 

The lemma means the following. If a point in the null space of Q 0 (that is, a point in the unobservable 
subspace) is chosen as the initial state, then the output will be zero for all time. Another way of saying this 
is that if the output is not zero for some time t, then the initial state is not in the unobservable subspace. 
We should note two things here. First, (and this is very important) there is no such thing as an observable 
subspace (hence we say not in the unobservable subspace). The second thing is that if y(t\) ^ 0 for some 
time ti, then there is a time interval containing ti where y(t) ^ 0. This follows from the fact that y is 
infinitely differentiable. 

Theorem 12.1.1 (Solution of the observability problem) Let the output y be given over a nonempty inter- 
val [0, T]. The following statements are equivalent: 

1. There exists an initial state xq such that y(t) = Ce At xofor all t £ [0, T]. 

2. Jq e A * t C*y(t)dt is in the range of Q* 0 . 

Moreover, if statement 2 holds, then all initial states xq that produce the output y are given by: 

x 0 = v + P 0 (T)+ [ T e AH C*y(t)dt 
Jo 

where v is an arbitrary vector in the null space of Q 0 and P 0 (T) + is the Moore-Penrose inverse of the 
observability gramian P 0 (T). 
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We now ask when is the initial state unique, or equivalently, when is the pair, (C, A) observable. 

Corollary 12.1.1 (System observability) The pair (C, A) is observable if and only if the null space of Q 0 
is {0}, or equivalently, the rank of Q 0 is n. 

According to the corollary, the initial system state and its subsequent evolution can be uniquely determined 
from the outputs if and only if the unobservable subspace contains nothing other than 0. This is intuitively 
clear because by the invariance Lemma 12.1.1 any initial state in the unobservable subspace gives zero 
output. Suppose that an initial state xq generates the output time history y. Then, since y = y + 0 and by 
linearity of the system, any initial condition of the form xq + v where v is in the unobservable subspace 
(and, hence, its contribution to the output is zero) also generates y as the output. So, for us to be able to 
uniquely determine the initial state, all initial conditions of the form xq + v must collapse to the same point 
xq. This means that the only element v in the unobservable subspace is 0. 

As in the case of controllability, the testable condition in the corollary is algebraic. We simply need to check 
if the rank of the observability matrix is equal to n to determine if the system is observable or not. The next 
theorem presents a few more algebraic characterizations. Compare the statements with the corresponding 
statements for controllability in Chapter 1 1 . 

Theorem 12.1.2 (Algebraic tests for observability) Let A and C be given matrices with A being n x n. 
The following statements are equivalent. 

1. The pair (C, A) is observable 

2. Null space of Q 0 = {0} (Rank of Q 0 = n) 

3. Range ofQ* 0 = W 1 

4. P 0 {T) > 0forallT> 0 

5. Rank of 

\I-A~ 
C 

is nfor all complex numbers A ( this is the PBH test) 

6. Rank of 

\I-A~ 
C 

is nfor all eigenvalues Xof A( this is the modified PBH test). 

7. Suppose that there exists x such that Ax = Xx and Cx = 0. Then, x = 0. 

8. Given any set /ii,/i2, ■ ■ •, p-n of n complex numbers, there exists an observer gain L such that the 
eigenvalues of A + LC are pi,p2,- ■ (this is the pole placement test). 
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The last statement refers to observer design problem which is to find an observer system: 

x = Ax — L(y — Cx) 
such that the poles of the observer error system: 

e = (A + LC)e 

are at some pre-specified complex numbers. Here, the observer error e = x — x. Usually, the poles of 
the error system are chosen to lie in the open left half plane so as to realize good transient and steady state 
performances. According to the statement, poles of the error system can be placed anywhere if and only if 
the pair (C, A) is observable. 



12.2 Detectability 



Recall that the modified PBH test says that the pair (C, A) is observable if and only if the matrix: 



XI -A 
C 



(12.2) 



has rank n for all eigenvalues A of A. Suppose that the pair (C, A) is unobservable. Then, there exists an 
eigenvalue A of A for which the rank of the matrix in (12.2) is strictly less than n. This matrix involving 
(C, A) allows us to put the eigenvalues of A into two groups: 



Definition 12.2.1 (Observable and unobservable modes) An eigenvalue Xof A is an observable mode of 
the pair (C, A) if and only if the matrix in (12.2) has rank n. A mode of (C, A) that is not observable is 
called an unobservable mode of the pair (C, A). 



Observability means that all the eigenvalues of A are observable modes of the pair (C, A). As in the case 
of controllability, this is too much to ask of practical systems. Recall that, by the pole placement test of 
Theorem 12.1.2, if the system is observable, then an observer system can be designed with any level of 
performance. Physical systems come with unobservable modes that limit the level of observer performance 
that can be realized. 



Definition 12.2.2 (Detectable) A pair (C, A) is (continuous-time) detectable if the real part of each unob- 
servable mode of the pair (C, A) is strictly less than 0. 

This definition has a nice interpretation in terms of observer design. As we know, if a mode is observable, 
then it can be moved to any place in the complex plane during observer design. If a mode is unobservable, 
then it cannot be moved by feedback of output. So, for zero steady state observer error, all unobservable 
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modes must be stable. If the unobservable modes are stable, then their contribution to the initial condition 
response decays exponentially. Hence, the observer error can be made to go to zero asymptotically. 

The main result on detectability is the following (compare with the corresponding statements for observ- 
ability as well as stabilizability in Chapter 11): 

Theorem 12.2.1 [Algebraic tests for detectability] Let A e R, nxn and C G IR mxn . The following state- 
ments are equivalent. 

1. The pair (C, A) is detectable 

2. Rank of 

'XI- A' 
C 

is nfor all complex numbers X in the closed right half plane (this is the PBH test). 

3. Rank of 

XI- A' 
C 

is nfor all eigenvalues Xof A in the closed right half plane ( this is the modified PBH test). 

4. There exists L such that all the eigenvalues of A — LC are in the open left half plane. 

12.3 Duality 

Even a cursory look at the algebraic characterizations of controllability given Chapter 11 and those of 
observability in the previous sections reveals many similarities. This is in spite of their different system- 
theoretic origins. Recall that controllability deals with driving the system from an initial state to a finite 
state by choosing a control input, whereas observability deals with finding the initial state from an observed 
output time history. As we shall see below, the algebraic characterizations imply a deeper connection known 
as duality. 

Let A e JR nxn and B e JR nxm . Consider a pair (A,B) and the associated controllability matrix 

Q C (A,B) = [B AB A 2 B ■■■ A^B] 
where, for the purpose of exposition, we use Q C (A, B) indicating that Q c depends on A and B. Define 



A = A T and C = B T 
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and consider the pair (C,A). The observability matrix associated with this pair is 



Qo(C,A) 



- c - 




- B T - 


CA 




B T A T 


.CA"- 1 , 




B T A n-lT_ 



Qo(B T ,A T ) 



where again we use Q 0 (C, A) to indicate the observablity matrix depends on C and A. Note that 

Q C (A,B)=Q 0 (B T ,A T ) T 

i.e., the controllability matrix of a pair (A, B) of real matrices is the transpose of the observability matrix 
of the pair (B T 1 A T ). It is well-known that the transpose of a finite-dimensional real matrix is its adjoint 
and is a mapping between dual spaces. This is why controllability and observability are said to be dual of 
each other. 



Theorem 12.3.1 (Duality between controllability and observability) Let A € M nXn and B e JR nXm . 
The pair (A, B) is controllable if and only if the pair (B T , A T ) is observable. 



Hence, any test for controllability of a pair is a test for observability of the dual pair and vice versa. It is a 
simple matter to verify that stabilizability and detectability are also duals of each other. 
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